342853

EURASIP Journal on Applied Signal Processing
Genetic and Evolutionary

Computation for Signal Processing
and Image Analysis
Guest Editors: Riccardo Poli and Stefano Cagnoni
Genetic and Evolutionary Computation

for Signal Processing and Image Analysis
Genetic and Evolutionary Computation

for Signal Processing and Image Analysis
Guest Editors: Riccardo Poli and Stefano Cagnoni
Copyright 2003 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2003 of EURASIP Journal on Applied Signal Processing. All articles are open access
articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
Editor-in-Chief
Marc Moonen, Belgium
Senior Advisory Editor

K. J. Ray Liu, College Park, USA
Associate Editors
Kiyoharu Aizawa, Japan
Gonzalo Arce, USA
Jaakko Astola, Finland
Kenneth Barner, USA
Mauro Barni, Italy
Sankar Basu, USA
Jacob Benesty, Canada
Helmut Blcskei, Switzerland
Chong-Yung Chi, Taiwan
M. Reha Civanlar, Turkey
Tony Constantinides, UK
Luciano Costa, Brazil
Zhi Ding, USA
Petar M. Djuri, USA
Jean-Luc Dugelay, France
Tariq Durrani, UK
Touradj Ebrahimi, Switzerland
Sadaoki Furui, Japan
Moncef Gabbouj, Finland
Fulvio Gini, Italy
A. Gorokhov, The Netherlands

Peter Handel, Sweden
Ulrich Heute, Germany
John Homer, Australia
Jiri Jan, Czech
Sren Holdt Jensen, Denmark
Mark Kahrs, USA
Ton Kalker, The Netherlands
Mos Kaveh, USA
Bastiaan Kleijn, Sweden
Ut-Va Koc, USA
Aggelos Katsaggelos, USA
C.-C. Jay Kuo, USA
Chin-Hui Lee, USA
Kyoung Mu Lee, Korea
Sang Uk Lee, Korea
Y. Geoffrey Li, USA
Ferran Marqus, Spain
Bernie Mulgrew, UK
King N. Ngan, Singapore
Naohisa Ohta, Japan

Antonio Ortega, USA
Mukund Padmanabhan, USA
Ioannis Pitas, Greece
Phillip Regalia, France
Hideaki Sakai, Japan
Wan-Chi Siu, Hong Kong
Dirk Slock, France
Piet Sommen, The Netherlands
John Sorensen, Denmark
Michael G. Strintzis, Greece
Tomohiko Taniguchi, Japan
Sergios Theodoridis, Greece
Xiaodong Wang, USA
Douglas Williams, USA
An-Yen (Andy) Wu, Taiwan
Xiang-Gen Xia, USA
Kung Yao, USA
Contents
Foreword, David E. Goldberg
Volume 2003 (2003), Issue 8, Pages 731-732
Editorial, Riccardo Poli and Stefano Cagnoni
Blind Search for Optimal Wiener Equalizers Using an Artificial Immune Network Model,
Romis Ribeiro de Faissol Attux, Murilo Bellezoni Loiola, Ricardo Suyama, Leandro Nunes de Castro,
Fernando Jos Von Zuben, and Joo Marcos Travassos Romano
Evolutionary Computation for Sensor Planning: The Task Distribution Plan, Enrique Dunn
and Gustavo Olague
An Evolutionary Approach for Joint Blind Multichannel Estimation and Order Detection,
Chen Fangjiong, Sam Kwong, and Wei Gang
Application of Evolution Strategies to the Design of Tracking Filters with a Large Number of
Specifications, Jess Garca Herrero, Juan A. Besada Portas, Antonio Berlanga de Jess,
Jos M. Molina Lpez, Gonzalo de Miguel Vela, and Jos R. Casar Corredera
Tuning Range Image Segmentation by Genetic Algorithm, Gianluca Pignalberi, Rita Cucchiara,
Luigi Cinque, and Stefano Levialdi
Parameter Estimation of a Plucked String Synthesis Model Using a Genetic Algorithm with Perceptual
Fitness Calculation, Janne Riionheimo and Vesa Vlimki
Optimization and Assessment of Wavelet Packet Decompositions with Evolutionary Computation,
Thomas Schell and Andreas Uhl
On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition
Systems in Adverse Conditions, Sid-Ahmed Selouani and Douglas O'Shaughnessy
Evolutionary Techniques for Image Processing a Large Dataset of Early Drosophila Gene Expression,
Alexander Spirov and David M. Holloway
A Comparison of Evolutionary Algorithms for Tracking Time-Varying Recursive Systems,
Michael S. White and Stuart J. Flockton
A Domain-Independent Window Approach to Multiclass Object Detection Using Genetic

Programming, Mengjie Zhang, Victor B. Ciesielski, and Peter Andreae
EURASIP Journal on Applied Signal Processing 2003:8, 731732

c 2003 Hindawi Publishing Corporation
Foreword
David E. Goldberg
Department of General Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Email: deg@uiuc.edu
I was delighted when I was asked to write a foreword to this

special issue on genetic algorithms (GAs) and evolutionary
computation (EC) in image and signal processing edited by
Riccardo Poli and Stefano Cagnoni for two reasons. First,
the special issue is another piece of the mounting evidence
that GAs and EC are finding an important niche in the solution of dicult real-world problems. Second, in reviewing
the contents of the special issue, I find it almost archetypal
in its reflection of the GA/EC applications world of 2003. In
the remainder of this discussion, I briefly review a number of
reasons why genetic and evolutionary techniques are becoming more and more important in real problems and discuss
some of the ways this issue used to both demonstrate eective
GA/EC application and foreshadow more signal and image
processing by evolutionary and genetic means.
There are a number of reasons why GAs and EC are becoming more prevalent in real applications. The first reason
is what I call the buzz. Let us face it, GAs are cool. The very
idea of doing a Darwinian survival of the fittest and genetics
on a computer is neat. But cool and neat, while they may attract our attention, do not merit our sustained involvement.
Another reason for which GAs have become more popular is the motivation from artificial systems. Although decades,
even centuries, of optimization and operations research leave
us with an impressive toolkit, the contingency basis of the
methodology leaves us somewhat cold. By this I mean that
the selection of an optimization technique or OR is contingent on the type of problem you face. If you have a linear
problem with linear constraints, you choose linear programming. If you have a stage decomposable problem, you choose
dynamic programming. If you have a nonlinear problem
with suciently pleasant constraints, you choose nonlinear
programming, and so on. But the very nature of this list of
methods that work in particular problems is part of the problem. One of the promises of biologically inspired techniques
is a framework that does not vary and a larger class of problems that can be tackled within that framework.
This vision of greater robustness is now being realized,
but it is tied to whether the solutions obtained using these
techniques are both tractable and practical. Results about
a decade ago showed that simple GAs in common practice

had a kind of Dr. Jekyll and Mr. Hyde nature. Simple genetic and evolutionary algorithms work well (subquadratically) on straightforward problems, but they require exponential times on more complex ones. This is not the place to
review these results in detail, and the interested reader can
look elsewhere (D. E. Goldberg, The Design of Innovation:
Lessons from and for Competent Genetic Algorithms, Kluwer,
Boston, 2002) but it suces to say that work on adaptive and
self-adaptive crossover and mutation operators is overcoming the tractability hurdle on real problems, resulting in what
appears to be broadly scalable (subquadratic) or competent
solvers.
Yet, theoretical tractability is of little solace to a practitioner who faces the daunting prospect of performing a million costly function evaluations on a 1000-variable problem.
As a result, increasing theory, implementation, and application are showing the way toward principled eciency enhancement using parallelization, time utilization, hybridization, and evaluation relaxation, and these methods are moving us from the realm of the competent (the tractable) to the
realm of the practical.
These fundamental reasonsthe buzz, the need, the
tractability, and the practicality of modern genetic and evolutionary algorithmsare driving an ever-increasing interest
in these methods, and this volume reflects that range of interest in terms of the application areas, operators, codings,
and accoutrements on display.
In terms of application, the use of GAs and EC in
this volume spans such disparate applications as filter tuning, sensor planning, system identification, object detection,
bioinformatic image processing, 3D model interpretation,
and speech recognition. The range of dierent applications
here is a reflection of the breadth of application elsewhere,
and the utility of the GA/EC toolkit across this landscape is
empirical evidence of the robustness of these methods.
Looking under the hood, we see a wide range of codings and operators in evidence, from floating-point vectors to permutations to program codes, from fixed to adaptive operators, and from crossover to mutation with various
732
competitive or clustered (or niched) selection mechanisms.

Additionally, many of the papers here demonstrate an understanding of the importance of eciency enhancement
in real-world problems, and a number of them combine
the best of genetic and evolutionary computation with local search to form useful and ecient hybrids that solve the
problem. Too often, methods specialists are enamored with
the method they helped invent or perfect, but in the real
world, ecient solutions are obtained with an eective combine of global and local techniques.
In all, this special issue is a useful compendium for those
interested in signal and image processing and the proper application of genetic and evolutionary methods to the unsolved problems of these domains. To the field of genetic and
evolutionary computation, this special issue is a growing evidence of the importance of what that field does in areas of
human endeavor that matter. To audience members in both
camps, I recommend without reservation that you study this
special issue, and absorb and apply its many lessons.
David E. Goldberg
David E. Goldberg is Jerry S. Dobrovolny

Distinguished Professor of Entrepreneurial
Engineering in the Department of General
Engineering at the University of Illinois at
Urbana-Champaign (UIUC). He is also Director of the Illinois Genetic Algorithms
Laboratory and is an aliate of the Technology Entrepreneur Center and the National Center for Supercomputing Applications. He is a 1985 recipient of a US National Science Foundation Presidential Young Investigator Award,
and in 1995, he was named an Associate of the Center for Advanced
Study at UIUC. He was a Founding Chairman of the International
Society for Genetic and Evolutionary Computation, and his book,
Genetic Algorithms in Search, Optimization and Machine Learning
(Addison-Wesley, 1989), is the fourth most widely cited reference
in computer science according to CiteSeer. He has just completed
a new monograph, The Design of Innovation (Kluwer, 2002), that
shows how to design scalable genetic algorithms and how such algorithms are similar to certain processes of human.

Editorial
Riccardo Poli
Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK
Email: rpoli@essex.ac.uk
Stefano Cagnoni
Department of Computer Engineering, University of Parma, 43100 Parma, Italy
Email: cagnoni@ce.unipr.it
1.
INTRODUCTION
Darwinian evolution is probably the most intriguing and

powerful mechanism of nature mankind has ever discovered. Its power is evident in the impressive level of adaptation reached by all species of animals and plants in nature. It
is intriguing because despite its simplicity and randomness,
it produces incredible complexity in a way that appears to
be very directed, almost purposeful. Like for other powerful
natural phenomena, it is no surprise that several decades ago
a few brilliant researchers in engineering and computer science started wondering whether they could steal the secrets
behind Darwinian evolution and use them to solve problems of practical interest in a variety of application domains.
Those people were pioneers of a new field which, after more
than 30 years from its inception, is now big, well established,
and goes under the name of genetic and evolutionary computation (GEC).
An almost endless number of results and applications of
evolutionary algorithms have been reported in the literature,
showing that the ideas of these pioneers were indeed right.
Nowadays, evolutionary techniques can routinely solve problems in domains such as automatic design, optimisation, pattern recognition, control, and many others. Until recently,
however, only very occasionally could one claim that GEC
techniques approached the performance of human experts in
these same domains, particularly in the case of large scale applications and complex engineering problems. This is why,
initially, successful applications of GEC techniques to the
fields of computer vision, image analysis, and signal processing were few and far in between. Towards the late 1990s, however, the research interest in these areas seemed to be rapidly
growing, and the time seemed right for the creation of an
infrastructure that could foster the interaction between researchers in this area. This is what led, in early 1998, the two
editors of this special issue, together with people from vari-
ous other European institutions, to create a working group of

the European Network of Excellence in Evolutionary Computation, entirely devoted to the applications of evolutionary algorithms to image analysis and signal processing. The
working group organises a regular meeting, the European
Workshop on Evolutionary Computation in Image Analysis
and Signal Processing (EvoIASP), which reached its fifth edition this year. This event gives European and non-European
researchers, as well as people from industry, an opportunity
to present their latest research, discuss current developments
and applications, besides fostering closer interaction between
members of the GEC, image analysis, and signal processing
scientific communities. However, the event is, and intends to
remain, a workshop. Therefore, the work presented there can
never have the depth allowed by more substantial and mature
archival publications such as this journal.
This special issue of EURASIP JASP on GEC for signal
processing and image analysis, being the first of its kind,
has oered computer scientists and engineers from around
the world a unique opportunity to submit their best mature
research for inclusion in this unified, high-quality venue.
The timing of this special issue could not have been better;
well over thirty papers were submitted by contributors from
around the world. The papers were reviewed by a pool of over
thirty international expert reviewers. Only about one third
passed our strict criteria for acceptance and are now in this
volume.
The rest of this editorial is organised as follows. In
Section 2, we will provide a gentle introduction to the basics of evolutionary computation. In Section 3, we describe
each of the papers present in this special issue, briefly summarising, for each one, the problem considered and the evolutionary techniques adopted to tackle it. In Section 4, we
provide our final remarks and acknowledgments, while in the
appendix, we give a brief commented bibliography with suggested further reading.
734
2.

EVOLUTIONARY COMPUTATION: THE BASICS
What were the main secrets behind Darwinian evolution,

that the pioneers of GEC stole to make them the propelling
fuel of evolutionary computation processes?
Inheritance
Individuals have a genetic representation (in nature, the
chromosomes and the DNA) such that it is possible for the
ospring of an individual to inherit some of the features of
its parent.
Variation
The osprings are not exact copies of the parents, but instead, reproduction involves mechanisms that create innovation as new generations are born.
Natural selection
Individuals best adapted to the environment have longer life
and higher chances of mating and spreading their genetic
makeup.
Clearly, there is a lot more to natural evolution than these
main forces. However, like for many other nature-inspired
techniques, not all the details are necessary to obtain working
models of a natural system. The three ingredients listed above
are in fact sucient to obtain artificial systems showing the
main characteristic of natural evolution, the ability to search
for highly fit individuals.
For all these ingredients (representation, variation, and
selection), one can focus on dierent realisations. For example, in nature, variation is produced both through mutations
of the genome and through the eect of sexually recombining the genetic material coming from the parents when obtaining the ospring chromosomes (crossover). This is why
many dierent classes of evolutionary algorithms have been
proposed over the years. So, depending on the structures undergoing evolution, on the reproduction strategies and the
variation (or genetic) operators adopted, and so on, evolutionary algorithms can be grouped into several evolutionary
paradigms: genetic algorithms (GAs) [1], genetic programming (GP) [2], evolution strategies (ESs) [3, 4], and so forth.
All the inventors of these dierent evolutionary algorithms (EAs) have to make choices as to which bits of nature
have a corresponding component in their algorithms. These
choices are summarised in the nature-to-computer mapping
shown in Table 1. That is, the notion of individual in nature
corresponds to a tentative solution to a problem of interest
in an EA. The fitness (ability to reproduce and have fertile
osprings that reach the age of reproduction) of natural individuals corresponds to the objective function used to evaluate the quality of the tentative solutions in the computer.
The genetic variation processes of mutation and recombination are seen as mechanisms (search operators) to generate
new tentative solutions to the problem. Finally, natural selection is interpreted as a mechanism to promote the diusion
and mixing of the genetic material of individuals representing good quality solutions and therefore having the potential
to create even fitter individuals (better solutions).
Table 1: Nature-to-computer mapping at the basis of EAs.

Nature
Computer
Individual
Population
Fitness
Chromosome
Solution to a problem
Set of solutions
Quality of a solution
Representation for a solution
(e.g., set of parameters)
Gene
Part of the representation of a solution

(e.g., parameter or degree of freedom)
Crossover
Mutation
Natural selection
Search operator
Search operator
Promoting the reuse of good (sub-)solutions
Despite their dierences, most EAs have the following

general form.
(1) Initialise population and evaluate the fitness of each
population member.
(2) Repeat.
(a) Select subpopulation for reproduction on the basis
of fitness (selection).
(b) Copy some of the selected individuals without
change (cloning or reproduction).
(c) Recombine the genes of selected parents (recombination or crossover).
(d) Mutate the mated population stochastically (mutation).
(e) Evaluate the fitness of the new population.
(f) Select the survivors based on fitness.
Not all these steps are present in all EAs. For example, in
modern GAs [5] and in GP phase, (a) is part of phases (b)
and (c), while phase (f) is absent. This algorithm is said to be
generational because there is no overlap between generations
(i.e., the ospring population always replaces the parent population). In generational EAs, cloning is used to simulate the
survival of parents for more than one generation.
In the following, we will analyse the various components of an EA in more detail, mainly concentrating on the
GA, although most of what we will say also applies to other
paradigms.
2.1.
Representations
Traditionally, in GAs, solutions are encoded as binary strings.

Typically, an adult individual (a solution for a problem) takes
the form of a vector of numbers. These are often interpreted
as parameters (for a plant, for a design, etc.), but in combinatorial optimisation problems, these numbers can actually represent configurations, choices, schedules, paths, and
so on. Anything that can be represented on a digital computer can also be represented in a GA using a binary representation. This is why, at least in principle, GAs have a really broad applicability. However, other nonbinary representations are available, which may be more suitable, for example, for problems with real-valued parameters.
Editorial
735
Because normally the user of a GA has no ideas as to what

constitutes a good initial set of choices/parameters for adult
individuals (tentative solutions to a problem), the chromosomes to be manipulated by the GA are normally initialised
in an entirely random manner. That is, the initial population
is a set of random binary strings or of random real-valued
vectors.
Crossover point
Crossover point
101010 1010
101010 1110
111000 1110
111000 1010
Parents
Ospring
2.2. Selection in GAs

Selection is the operation by which individuals (i.e., their
chromosomes) are selected for mating or cloning. To emulate
natural selection, individuals with a higher fitness should be
selected with higher probability. There are many models of
selection, some of which, despite fitting well the biologically
inspired computational model and producing eective results, are not biologically plausible. We briefly describe them
below.
Fitness proportionate selection, besides being the most direct translation into the computational model of the probabilistic principles of evolution, is probably the most widely
used selection scheme. This works as follows. Let N be the
population
size, fi the fitness of individual i, and f =

(1/N) j f j the average population fitness. Then, in fitness
proportionate selection, individual i is selected for reproduction with a probability
(a)
Crossover points
Crossover points
10 1010 1010
11 1010 1110
11 1000 1110
10 1000 1010
Parents
Ospring
(b)
1010101010
111010 1 110
1110001110
pi =
fi
j
fi
fj
= .
fN
(1)
Parents
In normal GAs, populations are not allowed to grow or

shrink, so N individuals have to be selected for reproduction.
Therefore, the expected number of selected copies of each individual is
fi
Ni = pi N = .
f
Ospring
(c)
Figure 1: Three crossover operators for binary GAs: (a) one-point

crossover, (b) two-point crossover, (c) uniform crossover.
(2)
So, individuals with an above-average quality ( fi > f) tend

to be selected more than once for mating or cloning, while
individuals below the average tend not to be used.
Tournament selection, instead, works as follows. To select
an individual, first a group of T (T 2) random individuals
is created. Then the individual with the highest fitness in the
group is selected, the others are discarded (tournament).
Another alternative is rank selection where individuals are
first sorted (ranked) on the ground of their fitness so that
if an individual i has fitness fi > f j , then its rank is i < j.
Then each individual is assigned a probability of being selected pi taken from a given distribution (typically a monotonic
rapidly decreasing function), with the constraint that

i pi = 1.
2.3. Operators
EAs work well only if their genetic operators allow an ecient
and eective search of the space of tentative solutions.
One desirable property of recombination operators is
to guarantee that two parents sharing a useful common
characteristic always transmit such a characteristic to their
ospring. Another important property is to also guarantee

that dierent characteristics distinguishing two parents may
be all inherited by their ospring. For binary GAs, there are
many crossover operators with these properties.
One-point crossover, for example, aligns the two parent chromosomes (bit strings), then cuts them at a randomly chosen common point and exchanges the right-hand
side (or left-hand side) subchromosomes (see Figure 1a).
In two-point crossover, chromosomes are cut at two randomly chosen crossover points and their ends are swapped
(see Figure 1b). A more modern operator, uniform crossover,
builds the ospring, one bit at a time, by randomly selecting one of the corresponding bits from the parents (see
Figure 1c).
Normally, crossover is applied to the individuals of a population with a constant probability pc (often pc [0.5, 0.8]).
Cloning is then applied with a probability 1 pc to keep the
number of individuals in each generation constant.
Mutation is the second main genetic operator used in
GAs. A variety of mutation operators exist. Mutation typically consists of making (usually small) alterations to the
736

Mutation
site
Mutation
site
101 0 101010
101 1 101010
Parent 2
Parameter 1
Parameter 2
Ospring
Parent 1
Figure 2: Bitwise mutation in binary GAs.

Parameter 3
values of one or more genes in a chromosome. Mutation is

often applied to the individuals produced by crossover and
cloning before they are added to the new population. In binary chromosomes, mutation often consists of inverting random bits of the genotypes (see Figure 2). The main goal with
which mutation is applied is preservation of diversity, which
helps GAs to explore as much of the search space as possible. However, due to its random nature, mutation may have
disruptive eects onto evolution if it occurs too often. Therefore, in GAs, mutation is usually applied to genes with a very
low probability.
In real-valued GAs, chromosomes have the form x =
x1 , . . . , x where each gene xi is represented by a floatingpoint number. In these GAs, crossover is often seen as an interpolation process in a multidimensional Euclidean space.
So, the components of the ospring o are calculated from the
corresponding components of the parents p and p as follows:
oi = pi + r pi pi ,
(3)
(a)
Parent 2
Parameter 1
Ospring
Parameter 2
Parent 1
Parameter 3
(b)
Mutated
individual
Parameter 1
Parameter 2
Random
displacement
where r is a random number in the interval [0, 1] (see

Figure 3a). Alternatively, crossover can be seen as the exploration of a multidimensional hyperparallelepiped defined by
the parents (see Figure 3b), that is, the components oi are
chosen uniformly at random within the intervals

min pi , pi , max pi , pi .
(4)
Individual
Parameter 3
(c)
Mutation is often seen as the addition of a small random variation (e.g., Gaussian noise) to a point in a multidimensional
space (see Figure 3c).
Figure 3: (a), (b) crossover operators and (c) mutation for realvalued GAs.
2.4. Other GEC paradigms

As mentioned before, the principles on which GAs are based
are also shared by many other EAs. However, the use of different representations and operators has led to the development of a number of paradigms, each having its own peculiarities. With no pretence of being exhaustive, in the following, we will shortly mention those paradigms, other than
GAs, that are used in the papers included in this special
issue.
Genetic programming [2, 6] is a variant of GA in which
the individuals being evolved are syntax trees, typically representing computer programs. The trees are created using userdefined primitive sets, which typically include input variables, constants, and a variety of functions or instructions.
The syntax trees are manipulated by specialised forms of

crossover and mutation that guarantee the syntactic validity of the ospring. The fitness of the individual trees in the
population is evaluated by running the corresponding programs (typically multiple times, for dierent values of their
input variables).
Evolution strategies [3, 4] are real-valued EAs where
mutation is the key variation operator (unlike GAs where
crossover plays that role). Mutation typically consists of
adding zero-mean Gaussian deviates to the individuals being optimised, with the mutation standard deviation being
varied dynamically so as to maximise the performance of the
algorithm.
Editorial
Artificial immune systems (see [7, Part III, Chapters 10
13] or [8] for an extensive introduction) are distributed computational systems inspired by biological immune systems,
which can recognise patterns and can remember previously
seen patterns in an ecient and eective way. These systems
are very close relatives of EAs (sometimes involving an evolutionary process in their inner mechanics) although they use
a dierent biological metaphor.
3.
THE PAPERS IN THIS SPECIAL ISSUE
In their paper entitled Blind search for optimal Wiener equalizers using an artificial immune network model, Attux et al.
exploit recent advances in the field of artificial immune systems to obtain optimum equalisers for noisy communication
channels, using a technology that does not require the availability of clean samples of the input signal. This approach is
very successful in a variety of test equalisation problems. The
approach is also compared with a more traditional EA, a GA
with niching, showing superior performance.
The paper by Dunn and Olague, entitled Evolutionary
computation for sensor planning, shows how well-designed
evolutionary computation techniques can solve the problem
of optimally specifying sensing tasks for a workcell provided
with multiple manipulators and cameras. The problem is
NP hard, eectively being a composition of a set partitioning problem and multiple traveling salesperson problems.
Nonetheless, thanks to clever representations and the use of
evolutionary search, this system is able to solve the problem,
providing solutions of quality very close to that of the solutions obtained via exhaustive search, but in a tiny fraction of
the time.
The paper entitled An evolution approach for joint blind
multichannel estimation and order detection by Fangjiong et
al. presents a method for the detection of the order and
the estimation of the parameters of a single-input multipleoutput channel. The method is based on a hybrid GA with
specially designed operators. The method shows performances comparable with existing closed-form approaches
which, however, are much more restricted in that they either
assume that the channel order is known or treat the problems
of order estimation and parameter estimation separately.
In Application of evolution strategies to the design of tracking filters with a large number of specifications, Herrero et
al. attack the problem of tracking civil aircrafts from radar
information within the extremely tight performance constraints imposed by a civil aviation standard. They use interactive multiple mode filters optimised by using an ES and
a multiobjective optimisation approach obtaining a highperformance aircraft tracker.
Making EAs more at hand and easy to apply for general
practitioners by self-tuning their parameters is one of the
main aims with which Pignalberi et al. developed GASE, a
GA-based tool for range image segmentation. The system,
along with some practical results, is described in the paper Tuning range image segmentation by genetic algorithm. A
multiobjective fitness function is adopted to take into consideration problems that are typically encountered in range
737
image segmentation.
The paper Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness
calculation describes the use of GAs to estimate the control parameters for a widely used plucked string synthesis
model. Using GAs, Riionheimo and Valimaki have been able
to automate parameter extraction, which had been formerly
achieved only through semiautomatic approaches, obtaining
comparable results, both in quantitative and in qualitative
terms. An interesting feature of the approach is the inclusion of knowledge about perceptual properties of the human
hearing system into the fitness function.
Schell and Uhl compare results obtained with a GA-based
approach to the near-best-basis (NBB) algorithm, a wellknown suboptimal algorithm for wavelet packet decomposition. In their paper Optimization and assessment of wavelet
packet decompositions with evolutionary computation, they
highlight the problem of finding good cost functions in terms
of correlation with actual image quality. They show that GAs
provide lower-cost solutions that, however, provide lowerquality images than NBB.
In the paper entitled On the use of evolutionary algorithms
to improve the robustness of continuous speech recognition
systems in adverse conditions, Selouani and OShaughnessy
show how a GA can tune a system based on state-of-the-art
speech recognition technology so as to maximise its recognition accuracy in the presence of severe noise. This hybrid
of evolution and conventional signal processing algorithms
amply outperforms nonadaptive systems. The EA used is a
GA with real-coded representation, rank selection, a heuristic type of crossover, and a nonuniform mutation operator.
The paper Evolutionary techniques for image processing a
large dataset of early Drosophila gene expression by Spirov and
Holloway describes an evolutionary approach to image processing to process confocal microscopy images of patterns of
activity for genes governing early Drosophila development.
The problem is approached using plain GAs, a simplex approach, and a hybrid between these two.
The use of GAs to track time-varying systems based on
recursive models is tackled in A comparison of evolutionary
algorithms for tracking time-varying recursive systems. The paper first compares a plain GA with a GA variant, called random immigrant strategy, showing that the latter performs
better in tracking time-varying systems even if it has problems with fast-varying systems. Finally, a hybrid combination
of GAs and local search that is able to tackle even such hard
tasks is proposed.
Zhang et al., in their paper A domain-independent window approach to multiclass object detection using genetic programming, propose an interesting approach in which GP is
used to both detect and localise features of interest. The approach is compared with a neural network classifier, used
as reference, showing that GP evolved programs can provide significantly lower false-alarm rates. Within the proposed approach, the choice of the primitive set is also discussed, comparing results obtained with two dierent sets:
one comprises only the four basic arithmetical operators, and
738
the other includes also transcendental functions. The results

reported in the paper provide interesting clues to practitioners that would like to use GP to tackle image processing tasks.
10.
4.
11.
CONCLUSIONS
The guest editors hope that the readership of the journal

will enjoy reading the papers in this special issue as we did
ourselves. We hope that the broadness of domains to which
EAs can be applied, demonstrated by the contents of this issue, will convince other researchers in image analysis and
signal processing to get acquainted with the exciting world
of evolutionary computation and to apply its powerful techniques to solve important new and old problems in these
areas.
12.
13.
14.
APPENDIX
POINTERS TO FURTHER READING IN GEC
1. David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,
Reading, Massachusetts, 1989. A classic book on genetic algorithms and classifier systems.
2. David E. Goldberg. The Design of Innovation: Lessons
from and for Competent Genetic Algorithms. Kluwer
Academic Publishers, Boston, 2002. An excellent, longawaited followup of Goldbergs first book.
3. Melanie Mitchell, An introduction to genetic algorithms,
A Bradford Book, MIT Press, Cambridge, MA, 1996. A
good introduction to genetic algorithms.
4. John H. Holland, Adaptation in Natural and Artificial
Systems, second edition, A Bradford Book, MIT Press,
Cambridge, MA, 1992. Second edition of a classic from
the inventor of genetic algorithms.
5. Thomas Back and Hans-Paul Schwefel. An overview
of evolutionary algorithms for parameter optimization. Evolutionary Computation, vol. 1, no. 1, pp. 123,
1993. A good introduction to parameter optimisation
using EAs.
6. T. Back, D. B. Fogel and T. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators,
Institute of Physics Publishing, 2000. A modern introduction to evolutionary algorithms. Good both for
novices and more expert readers.
7. John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT
Press, 1992. The bible of genetic programming by the
founder of the field. Followed by GP II (1994), GP III
(1999), and GP IV (forthcoming).
8. Wolfgang Banzhaf, Peter Nordin, Robert E. Keller and
Frank D. Francone, Genetic ProgrammingAn Introduction; On the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann, 1998.
An excellent textbook on GP.
9. W. B. Langdon and Riccardo Poli, Foundations of Genetic Programming, Springer, February 2002. The only
15.
16.
17.
book entirely devoted to the theory of GP and its relations with the GA theory.
Proceedings of the International Conference on Genetic Algorithms (ICGA). ICGA is the oldest conference on EAs.
Proceedings of the Genetic Programming Conference.
This was the first conference entirely devoted to GP.
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). Born in 1999 from the
recombination of ICGA and the GP conference mentioned above, GECCO is the largest conference in the
field.
Proceedings of the Foundations of Genetic Algorithms
(FOGA) Workshop. FOGA is a biannual, small but
very prestigious and highly selective workshop. It is
mainly devoted to the theoretical foundations of EAs.
Proceedings of the Congress on Evolutionary Computation (CEC). CEC is a large conference under the patronage of IEEE.
Proceedings of Parallel Problem Solving from Nature
(PPSN). This is a large biannual European conference,
probably the oldest of its kind in Europe.
Proceedings of the European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP). This is a small workshop, reaching
its fifth edition in 2003. It is the only event worldwide
uniquely devoted to the research topics covered by this
special issue.
Proceedings of the European Conference on Genetic
Programming. EuroGP was the first European event
entirely devoted to GP. Run as a workshop in 1998
and 1999, it became a conference in 2000. It has now
reached its sixth edition with EuroGP 2003 held at the
University of Essex. Currently, this is the largest event
worldwide solely devoted to GP.
ACKNOWLEDGMENTS
The guest editors would like to thank Professor David E.
Goldberg for his insightful foreword, the former and present
editors-in-chief of EURASIP JASP, Professor K. J. Ray Liu
and Professor Marc Moonen, for their support in putting together this special issue, and all the reviewers who have generously devoted their time to help ensure the highest possible
quality for the papers in this volume. All the authors of the
manuscripts who have contributed to this special issue are
also warmly thanked.
Riccardo Poli
Stefano Cagnoni
REFERENCES
[1] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich, USA, 1975.
[2] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge,
Mass, USA, 1992.
Editorial
[3] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, FrommannHolzboog, Stuttgart, Germany, 1973.
[4] H.-P. Schwefel, Numerical Optimization of Computer Models,
Wiley, Chichester, UK, 1981.
[5] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press,
Cambridge, Mass, USA, 1996.
[6] W. B. Langdon and R. Poli, Foundations of Genetic Programming, Springer-Verlag, New York, NY, USA, 2002.
[7] D. Corne, M. Dorigo, and F. Glover, Eds., New Ideas in Optimization, McGraw-Hill, London, UK, 1999.
[8] D. Dasgupta, Ed., Artificial Immune Systems and Their Applications, Springer-Verlag, New York, NY, USA, 1999.
Riccardo Poli received a Ph.D. degree in
bioengineering (1993) from the University
of Florence, Italy, where he worked on image analysis, genetic algorithms, and neural networks until 1994. From 1994 to 2001,
he was a lecturer and then a reader in
the School of Computer Science of the
University of Birmingham, UK. In 2001,
he became a Professor at the Department
of Computer Science of the University of
Essex, where he founded the Natural and Evolutionary Computation Group. Professor Poli has published around 130 papers on
evolutionary algorithms, genetic programming, neural networks,
and image analysis and signal processing, including the book Foundations of Genetic Programming (Springer, 2002). He has been
Cochair of EuroGP, the European Conference on GP, in 1998,
1999, 2000, and 2003. He was Chair of the GP theme at the Genetic and Evolutionary Computation Conference (GECCO) 2002
and Cochair of the Foundations of Genetic Algorithms Workshop
(FOGA) 2002. He will be General Chair of GECCO 2004. Professor Poli is an Associate Editor of Evolutionary Computation (MIT
Press) and Genetic Programming and Evolvable Machines (Kluwer),
a reviewer for 12 journals, and has been a programme committee
member of 40 international events.
Stefano Cagnoni has been an Assistant Professor in the Department of Computer Engineering of the University of Parma since
1997. He received the Ph.D. degree in bioengineering in 1993. In 1994, he was a
Visiting Scientist at the Whitaker College
Biomedical Imaging and Computation Laboratory at the Massachusetts Institute of
Technology. His main research interests are
in computer vision, evolutionary computation, and robotics. As a member of EvoNet, the European Network of Excellence in Evolutionary Computation, he has chaired
the EvoIASP working group on evolutionary computation in image analysis and signal processing, and the corresponding workshop since its first edition in 1999. He is a reviewer for several journals and a programme committee member of several international
events.
739

Blind Search for Optimal Wiener Equalizers Using

an Artificial Immune Network Model
Romis Ribeiro de Faissol Attux
DSPCOM, DECOM, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil
Email: romisri@decom.fee.unicamp.br
Murilo Bellezoni Loiola

Email: mloiola@decom.fee.unicamp.br
Ricardo Suyama
Email: rsuyama@decom.fee.unicamp.br
Leandro Nunes de Castro

DCA, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil
Email: lnunes@dca.fee.unicamp.br
Fernando Jose Von Zuben

DCA, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil
Email: vonzuben@dca.fee.unicamp.br
Marcos Travassos Romano

Joao
Email: romano@decom.fee.unicamp.br
Received 28 June 2002 and in revised form 1 December 2002
This work proposes a framework to determine the optimal Wiener equalizer by using an artificial immune network model together
with the constant modulus (CM) cost function. This study was primarily motivated by recent theoretical results concerning the
CM criterion and its relation to the Wiener approach. The proposed immune-based technique was tested under dierent channel
models and filter orders, and benchmarked against a procedure using a genetic algorithm with niching. The results demonstrated
that the proposed strategy has a clear superiority when compared with the more traditional technique. The proposed algorithm
presents interesting features from the perspective of multimodal search, being capable of determining the optimal Wiener equalizer
in most runs for all tested channels.
Keywords and phrases: blind equalization, constant modulus algorithm, evolutionary computation, artificial immune systems,
immune network model.
1.
INTRODUCTION
The constant modulus (CM) criterion [1, 2, 3] is a broadly

studied blind equalization technique. The last 20 years have
seen the proposal of many relevant works scrutinizing the basis of the CM criterion and its relation to other criteria.
These works pointed out two aspects that deserve to be
highlighted [3, 4]:
(1) the CM cost function is multimodal;
(2) there is an intimate relationship between CM minima

and some Wiener optima.
In particular, the literature indicates a one-to-one relationship between the best Wiener solutions and the minima
of the CM criterion.
From these considerations, it is possible to make a strong
claim: if one can determine the CM global minima, then the
best possible Wiener receiver can also be evaluated.
Blind Search for Optimal Equalizers Using Immune Networks

This suggestion opens an exciting perspective: the possibility of obtaining the best equalizer (in the mean square error
sense) without a desired signal, that is, by using a blind or unsupervised search strategy. To achieve this goal, it is necessary
to propose a method capable of locating, over a set of local minima, the best CM minimum in most of the runs performed by the algorithm. Evolutionary algorithms (EAs) are
particularly suitable to determine the optimal Wiener equalizer because they present a high capability of performing
an exploratory search when a priori knowledge is not available.
This paper proposes to apply the optimization version
of an artificial immune network model, named opt-aiNet
[5], to the problem of determining the optimal Wiener
solution. By combining the CM criterion with the optaiNet algorithm, this paper introduces a novel framework
(CM + opt-aiNet) to obtain the optimal receiver.
Dierent channel models and filter orders were used to
evaluate the potential for finding the global Wiener minimum. In some cases, the proposed strategy was compared
with an approach based on genetic algorithms with niching
[6], which proved to be a valuable tool to solve this problem,
and thus benchmark the proposed technique. In all cases,
the obtained results validated the framework, demonstrating
that it is possible to find the optimal equalizer for a given
channel by using a powerful blind search technique.
The paper is organized as follows. Section 2 presents
some theoretical considerations on the equivalence between
the CM minima and Wiener solutions, a cornerstone of this
work. Section 3 introduces the immunologically inspired algorithm, named opt-aiNet, and places it in the context of
other search techniques, with particular emphasis on EAs.
Section 4 presents the simulation results and discusses the
performance of the algorithm by comparing it with a genetic
algorithm with niching. The final remarks and future trends
are presented in Section 5.
2.
ADAPTIVE CRITERIA: THEORETICAL BASIS
The main goal of communications engineering is to provide adequate message interchange, through a certain channel, between a transmitter and a receiver. Nevertheless, the
channels introduce distortion in the transmitted message,
what usually leads to severe degradation. A device named
equalizer filters the received signal in order to recover the
desired information. Figure 1 depicts the schematic channel
and equalizer representation in a communication system, together with their respective input and output signals.
From Figure 1, it can be inferred that the main goal of the
equalizer is to obtain an output signal as similar as possible
to the transmitted signal, except for a gain K and a delay d,
that is,
y(n) = K s(n d),
(1)
which is the well-known zero-forcing (ZF) condition.

In most applications, the equalizer is implemented using
a finite impulse response (FIR) filter, which is a mathemati-
741
s(n)
x(n)
Channel
y(n)
Equalizer
Figure 1: Elements of a communication system.
cally simple and inherently stable structure. Its input-output

relationship is given by
y(n) = wT x(n),
(2)
where w is the equalizer coecient vector of length L and

x(n) = [x(n)x(n 1) x(n L + 1)]T is the input vector.
Consequently, the central problem is to adjust the vector
w in order to obtain a good equalization condition, that is,
a condition as close as possible to the ZF (1). If it is possible
to count on a priori knowledge of the channel impulse response, the task becomes purely mathematical. When this is
not the case, it is necessary to determine a suitable optimization criterion.
When information about the transmitted signal is, at
least for some time, at hand, it is possible to make use of the
Wiener criterion, based on the following mean square error
(MSE) cost function:

JW = E s(n d) y(n)
2
(3)
where d is the previously defined equalization delay. If this

delay is known a priori, JW has a single minimum, named
the Wiener solution. As a rule, each Wiener solution possesses
a distinct MSE. This accounts for an important assertion: if
the equalization delay is a free parameter of (3), then JW has
several minima (multiple local optima). Among these many
optima, there is, usually, a single optimal Wiener solution, associated with an optimal delay.
As can be deduced from the comparison between (1) and
(3), the Wiener criterion is strongly related to the ZF condition. Hence, the determination of the optimal Wiener solution is very important and has a great practical appeal. However, there are two main diculties: the use of samples of the
transmitted signal and the choice of d.
The drawback associated with the dependence on a pilot
signal was the main motivation behind the proposal of blind
techniques, that is, criteria which do not make use of samples
of s(n). Among these, the CM criterion has received special
attention in the last twenty years. Its cost function is given by
JCM = E
where

2
2
R2 y(n)

4

E s(n)
(4)
R2 =
2
.
E s(n)
(5)
The cost function presented in (4) has multiple minima,

except in some trivial cases. Recent works [3, 4] have pointed
in the direction of an intimate relationship between these
742
minima and some Wiener solutions (the best ones). This is

the core of the CM part of the framework proposed here.
2.1. Relationship between CM minima
and Wiener optima
The rationale of this work is to find an optimal method for
the design of blind equalizers. Since the notion of optimality
can be related to the concept of supervised adaptive filtering,
it is important to discuss the relationship between Wiener
and CM minima. This discussion relies on the following assumptions.
Assume that the best Wiener solutions are close to the
best CM minima so that each minimum of the former class
can be achieved from a minimum of the latter class through
a simple steepest descent algorithm (that will be further described). Therefore, to find the CM global optimum is equivalent to determining the optimal Wiener solution. We will always assume that there is at least one good Wiener solution,
that is, one that provides perfect recovery in the absence of
noise. Such assumption is not reasonable only in a few particular cases (e.g., when there is a channel zero at 1).
The main result of this claim is that it becomes feasible
to determine the best possible equalizer without supervision,
that is, by using a blind search strategy.
The key to accomplish such a demanding task on the CM
cost function is to use strategies capable of performing not
only global search but also multimodal search, such as EAs
with niching and the immunologically inspired technique to
be discussed in the next section.
Therefore, it is important to choose a method capable of
providing a good balance between exploration and exploitation of the search space. This balance allows for the algorithm
to exploit specific portions of the search space without compromising its global search potentialities. These features were
found in EAs with niching and a technique inspired by some
theories of how the human immune system works.
The last step of the framework is to refine the CM solution through the decision-directed (DD) algorithm in order
to compensate for the inherent dierence between this and
the Wiener solution. The iterative expression of the DD algorithm is

w(n + 1) = w(n) + dec y(n) y(n) x(n),
(6)
where dec[y(n)] is simply the slicer output.

In a previous work, the same task has been performed
using a genetic algorithm with niching, and good results were
reported [6], which will serve as a basis for comparison in
Section 4.
3.
IMMUNOLOGY, ARTIFICIAL IMMUNE SYSTEMS,

AND AN IMMUNE NETWORK MODEL
Together with many other bodily systems, such as the nervous and the endocrine systems, the immune system plays a
major role in maintaining life. Its primary functions are to
defend the body against foreign invaders (e.g., viruses, bacteria, funguses, etc.) and to eliminate the malfunctioning self
cells and debris.
The interest in studying and understanding the immune

system gave rise to immunology, a science with approximately 200 years of age. More recently, however, computer
scientists and engineers have found several interesting theories concerning the immune system and its functioning that
could be very helpful in the development of artificial systems
and computational tools capable of solving complex problems. The new field of research that emerged from this interdisciplinary research on immunology, computer science,
engineering, and others, is named artificial immune systems
[7].
3.1.
The clonal selection and the immune

network theories
Among the many theories used to explain how the immune

system works, two were explored in the development of the
algorithm used in this paper: (1) the clonal selection theory
[8] and (2) the immune network theory [9].
According to the clonal selection theory, when a diseasecausing agent, named pathogen, enters the organism, a number of immune cells capable of recognizing this pathogen are
stimulated and start replicating themselves. The number of
copies each cell generates is directly proportional to the quality of the recognition of the pathogen, that is, the better a cell
recognizes a pathogen, the more copies of itself will be generated. During this self-replicating process, a mutation event
with high rates also occurs such that the progenies of a single
cell are slight variations of the parent cell. This mutational
process of the immune cells has the remarkable feature of
being inversely proportional to the quality of the pathogenic
recognition; the higher the quality of the recognition, the
smaller the mutation rate, and vice versa.
The clonal selection theory, briefly described above, is
broadly used to explain how the immune system defends the
body against pathogens. With a revolutionary view of the immune system, Jerne [9] proposed a novel theory to explain,
among many other things, how the immune system reacts
against itself. Jerne suggested that the immune cells are naturally capable of recognizing each other, and the immune system thus presents a dynamic behavior even in the absence
of pathogens. When an immune cell recognizes another immune cell, it is stimulated and the recognized cell is suppressed. In the original network theory, the results of stimulation and suppression were not clearly defined. Therefore,
dierent immune network models present distinct ways of
accounting for network stimulation and suppression.
The discussion to be presented in Section 3.2 is restricted
to the specific artificial immune network model used in this
work, which combines clonal selection with the immune network theory.
3.2.
An artificial immune network model to perform

multimodal search
In [10], de Castro and Von Zuben proposed an artificial immune network model, named aiNet, inspired by the clonal
selection and network theories of the immune system. This
algorithm is demonstrated to be suitable to perform data

compression and clustering with the aid of some statistical
and graph theoretical strategies.
The aiNet adaptation procedure was further improved
in [5], and transformed into an algorithm to perform
multimodal search, named opt-aiNet. Several features of optaiNet can be highlighted. (1) It is a population-based search
technique, in which each individual of the population is a
real-valued vector represented according to the problem domain. (2) The size of the population, that is, the number of
individuals in the population, is dynamically adjusted. (3) It
is capable of locating multiple optima by making a balance
between exploitation (through a local search technique based
on clonal selection and expansion) and exploration (through
a dynamic diversity maintenance mechanism).
In a simplified form, the opt-aiNet algorithm can be
summarized with the procedure below.
(1) Initialization. Randomly initialize a population with a
small number of individuals.
(2) While stopping criterion is not met, do the following.
(2.1) Fitness evaluation. Determine the fitness (goodness or quality) of each individual of the population and normalize the vector of fitness.
(2.2) Replication. Generate a number of copies
(osprings) of each individual.
(2.3) Mutation. Mutate each of these copies inversely
proportionally to the fitness of its parent cell, but
keep the parent cell. The mutation
c = c + N(0, 1),
=
(2.4)
(2.5)
(2.6)
(2.7)
1
exp( f )
(7)
follows, where c is a mutated individual c,

N(0, 1) is a Gaussian random variable of zero
mean and standard deviation = 1, is a parameter that controls the decay of an inverse exponential function, and f is the fitness of an individual normalized in the interval [0, 1]. A mutation is only accepted if the mutated individual
c is within its range of domain.
Fitness evaluation. Determine the fitness of all
new (mutated) individuals of the population.
Selection. For each clonegroup formed by the
parent individual and its mutated ospring
select the individual with highest fitness and calculate the average fitness of the selected population.
Local convergence. If the average fitness of the
population is not significantly dierent from the
one at the previous iteration, then continue, else
return to step (2.1).
Network interactions. Determine the anity (degree of similarity measured via the Euclidean
distance) of all individuals of the population.
Suppress (eliminate) all but the highest fitness
of those individuals whose anities are less than
a suppression threshold s and determine the
743
number of network individuals, named memory
cells, after suppression.
(2.8) Diversity introduction. Introduce a percentage
d% of randomly generated individuals and return to step (2).
(3) EndWhile.
The original stopping criterion proposed for the algorithm is based on the number of memory cells. After the
network interactions (step (2.7)), a certain number of individuals remain. If this number does not vary from one iteration to the other, then the network is said to have a stable
population size. In such condition, the remaining individuals
are all memory cells corresponding to local optima solutions.
However, in accordance with the classical modus operandi in
adaptive equalization, a maximum number of iterations was
adopted as the stopping criterion.
For a more computational description of the immune algorithm presented, the reader is invited to visit the
website http://www.cs.ukc.ac.uk/people/sta/jt6/aisbook/aisimplementations.htm, from where the original Matlab code
for the opt-aiNet and many other immune algorithms can
be downloaded.
3.3.
How opt-aiNet works?
The behavior of the opt-aiNet adaptation procedure can be

simply explained. In steps (2.1) to (2.5), a local search is
being performed based on the clonal selection theory. At
each iteration, a population of individuals is locally optimized through reproduction, anity proportional mutation,
and selection (exploitation of the search space). The fact
that no parent individual has a selective advantage over the
others contributes to the multimodal search of the algorithm.
Steps (2.6) to (2.8) check for the convergence of the local
search procedure, eliminate redundant individuals, and introduce diversity in the population. When the initial population reaches a stable state (determined by the stabilization of
its average fitness), the cells interact with each other in a network form, that is, the Euclidean distance between each pair
of individuals is determined, and some of the similar cells
are eliminated to avoid redundancy. In addition, a number
of randomly generated individuals are added to the current
population, allowing for a broader exploration of the search
space, and the process of local optimization restarts in step
(2.1).
To illustrate the behavior of the opt-aiNet algorithm, assume the simple bidimensional function
f (x, y) = x sin(4x) y sin(4 y + ) + 1
(8)
to be maximized.
Figure 2a depicts f (x, y) and an initial population of 13
individuals after the local search part of the algorithm was
completed for the first time (steps (2.1) to (2.6)). Note that
all the remaining 13 individuals are positioned in peaks of the
function. Figure 2b depicts the function to be optimized after
the convergence of the algorithm. In this case, nearly all peaks
744
f (x, y)
2.5
2
1.5
1
0.5
0
0.5
1
1
0.5
y
0
0.5
0.5
0.5
(a)
f (x, y)
2.5
2
1.5
1
0.5
0
0.5
1
1
0.5
1
y
0
0.5
0.5
0.5
0
x
(b)
Figure 2: Illustrative performance of the opt-aiNet algorithm when

applied to the function described in (8).
of the function were determined, including the four global

optima and all local optima of very low values in comparison
with the highest peaks.
3.4. Opt-aiNet and other search techniques
The algorithm described in this paper is most often characterized as an immune algorithm since it is inspired in the
immune system. Nevertheless, the similarities between some
immune and EAs are striking and deserve remarks.
EAs can be defined as search and optimization strategies
with their origins and inspiration in the biological processes
of evolution [11]. For an algorithm to be characterized as
evolutionary, it has to present a population of individuals
that are subjected to reproduction, genetic variation, and selection. Therefore, most EAs are comprised of the following
main steps: (1) reproduction with inheritance, (2) selection,
and (3) genetic variation [12].
If one looks into the clonal selection theory of the immune system, briefly reviewed in Section 3.1 and used as a
part of the opt-aiNet algorithm, it is clear that the main steps

of an EA (reproduction, selection, and variation) are embodied in the clonal selection procedure. Steps (2.1) to (2.3) of the
opt-aiNet algorithm correspond to the clonal selection principle of the immune system. These can be likened to a genetic algorithm [13] with no crossover and elitist selection,
or to the evolution strategies originally proposed by Schwefel
[14].
However, it is important to remark that a number of differences exist among them, in addition to their sources of
inspiration. For instance, in opt-aiNet, no coding of the individuals is performed, as in the case of genetic algorithms,
the mutation rate of each individual is inversely proportional to fitness (an original approach inspired by some immune mechanisms), and a deterministic and elitist selection
scheme is adopted.
Another remarkable dierence between the opt-aiNet
and any EA is the presence of direct interactions (connections) between the network individuals (cells). In optaiNet, as individuals are connected with each other in a
network-like structure, a dynamic control of the population size can be performed. We are not going much further
into specific dierences between these algorithms, but the
interested reader is invited to refer to [5, 15] for additional
discussions.
Since all the evolutionary steps are embodied in the
adaptive procedure of opt-aiNet, it is possible to consider
EAs to be particular cases of opt-aiNet. Taking into account
an opposite viewpoint, it is possible to claim that the optaiNet algorithm is nothing but a new type of evolutionary approach inspired by the immune system, for it contains the main steps of reproduction, variation, and selection, which an algorithm needs to be characterized as evolutionary. Regardless of which algorithm can be viewed as
a particular case of the other, it is important to note that
both are adaptive systems suitable for exploratory search.
There is a main dierence in performance, however, once
the opt-aiNet is intrinsically suitable for performing multimodal search, while EAs require modifications to tackle such
problems.
Empirical comparisons could also be performed between
the opt-aiNet algorithm and other search procedures, such
as simulated annealing [16] and particle swarm optimization techniques [17]. However, as the nature of the optimal Wiener equalizer problem requires an algorithm capable of eciently locating multiple solutions to the problem,
the performances of these algorithms are supposed not to
be competitive with the ones presented by EAs with niching and the opt-aiNet algorithm. However, empirical investigation must still be undertaken in order to validate this
claim.
4.
SIMULATION RESULTS
In order to evaluate the performance of the opt-aiNet algorithm when applied to search for the optimal Wiener
equalizers, three dierent channels (C1, C2, and C3) were

Table 3: Results of C2 and 7-coecient equalizer.
Table 1: Simulation parameters.

Parameter
Initial population
Suppression threshold (s )
Number of osprings per cell
(equation (7))
Maximum number of iterations
Number of runs
745
Solution
Wopt
W2
W3
W4
W5
Value
5
0.35
10
50
1000
100
MSE Freq. (GA + niching) Freq. (opt-aiNet)

0.0312
48%
100%
0.0458
40%
0.0917
8%
0.0918
2%
0.1022
2%

Solution
Wopt
W2
W3
W4
W5
W6
Solution
Wopt
W2
W3
MSE Freq. (GA + niching) Freq. (opt-aiNet)

0.1293
48%
82%
0.1397
22%
17%
0.1445
12%
1%
0.1533
10%
0.1890
4%
0.1951
4%
Residual MSE
0.0071
0.0075
0.0104
Freq. (opt-aiNet)
66%
32%
2%

Solution
Wopt
W1
considered. Their transfer functions are as follows:
Residual MSE
0.0071
0.0075
Freq. (opt-aiNet)
84%
16%
HC1 = 1 + 0.4z1 + 0.9z2 + 1.4z3 ,

HC2 = 1 + 1.2z1 0.3z2 + 0.8z3 ,
(9)
HC3 = 1 + 0.6z1 0.7z2 + 2.5z3 .

C1 and C2 are nonminimum phase channels and C3 has
maximum phase. The equalizer, as mentioned in Section 2, is
always an FIR filter with L coecients. We estimate the CM
cost function through time averaging and use the mapping
JFIT =
1
1 + JCM
(10)
to generate the fitness. The basic idea behind this conversion

is to transform minima into maxima.
We used the immune network model, as discussed in
Section 3.2, to obtain the CM global minimum for these channels. The best individual was refined by the aforementioned
DD algorithm (6) and compared with the Wiener solutions.
This procedure allows a direct verification of the potentialities of the proposed method. The results are presented in
terms of convergence rates to dierent minima, which favors
a straightforward performance analysis.
The default values for the parameters used to run the optaiNet algorithm are presented in Table 1.
The first test was performed with channel C1 and an 8coecient equalizer. The results are summarized in Table 2,
together with the equivalent outcome produced by the GA
benchmark [6]. In all tables, Wopt , W2 , W3 , and so forth
stand for the various Wiener minima (ranked according to
their MSE).
The results demonstrate that the immune network was
able to find the global optimum in most cases, thus surpassing the GA by a great margin. It is also relevant to observe
that when global convergence did not occur, the rule was to
pick W2 , contrarily to what the benchmark outcome reveals.

The second test was carried out with channel C2 and a
7-coecient equalizer. The results are presented in Table 3,
together with the GA performance.
In this case, the results are even more impressive; the immune network was capable of determining the best minimum in all runs. Again, the proposal led to results far superior to those achieved by the GA.
Finally, channel C3 and a 12-coecient equalizer were
considered. We chose this equalizer length to increase the size
of the search space, thus increasing the problem diculty.
There is no available benchmark in this case. Table 4 presents
the results for the opt-aiNet algorithm.
The global convergence rate is lower than that of the previous test cases. However, simulation performances such as
the one illustrated in Section 3, and previous experience with
machine-learning techniques, encouraged us to try to improve the performance of the algorithm by varying some of
its adaptation parameters. Based upon the discussion presented in [5, 10], concerning the importance of each parameter, beta was changed to = 100. This choice would lead to
a more precise local search, that is, capability of dealing with
the MSE similarity between Wopt and W2 . Table 5 depicts the
results.
By simply fine-tuning the local search of opt-aiNet, a
greater improvement in its performance could be observed.
The method once more proved itself capable of achieving optimal performance in the vast majority of trials.
The results presented so far are good indicators of the
opt-aiNet potentiality to locate the global optima solutions.
However, it is known that this algorithm is capable of determining most local optima solutions of a given problem, as
illustrated in Figure 2b. To study how the multimodal search
of opt-aiNet works on problems C1 to C3, assume, without
746
W3
filters were used for evaluation and comparison. The results

were much favorable to the opt-aiNet algorithm, which can
also be understood as an evolutionary search technique inspired by the immune system.
These investigations support the establishment of CMbased evolutionary search as a strong paradigm for optimal
blind equalization.
A natural extension of this work is the testing of the
opt-aiNet algorithm with its automatic stopping criterion so
that the amount of user-defined parameters of the algorithm
[10, 15] could be reduced. Further studies also involve the
use of the opt-aiNet in the context of nonlinear equalization,
prediction, and identification.
W3
ACKNOWLEDGMENTS
W4
Romis Attux would like to thank FAPESP for its financial

support. Ricardo Suyama, Leandro Nunes de Castro (Profix.
540396/01-0), and Fernando Von Zuben (300910/96-7)
would like to thank CNPq for the financial support.
Table 6: Individuals of the population and associated Wiener receivers.

[0.1740
0.4516
[0.1805
0.4559
[0.1113
0.2195
[0.1041
0.2160
[0.0094
0.2261
[0.0045
0.2389
[0.0025
0.0773
[0.0019
0.0747
[0.0795
0.1515
[0.0824
0.1458
[0.1197
0.1494
[0.1355
0.1485
[0.1768
0.1024
Individuals of the population

0.1297
0.0852
0.1137
0.1007
0.1303
0.0967
0.1021
0.0949
0.0377
0.0297
0.0414
0.2687
0.0394
0.0205
0.0475
0.2449
0.2060
0.1512
0.4798
0.1023
0.2133
0.1477
0.4794
0.1108
0.0025
0.2134
0.2414
0.4720
0.0036
0.2117
0.2416
0.4692
0.0877
0.1181
0.1279
0.1087
0.0959
0.1175
0.1282
0.1022
0.1265
0.3463
0.1260
0.1174
0.1127
0.3420
0.1198
0.1248
0.3269
0.1402
0.1249
0.0497
Close to
0.1882
0.0860]
0.1862
0.0781]
0.2622
0.5315]
0.2460
0.5218]
0.0991
0.0547]
0.1002
0.0527]
0.1518
0.0815]
0.1416
0.0835]
0.3541
0.0761]
0.3742
0.0508]
0.1777
0.0538]
0.1769
0.0601]
0.0967
0.0252]
Wopt
Wopt
W2
W2
W4
W6
W6
W5
W5
W7
loss of generality, the particular case of channel C1. The first

column of Table 6 presents some of the individuals of a typical run of opt-aiNet when applied to C1. In the vicinity
of each individual, we find an associated Wiener solution,
presented in column 2 of Table 6 (eventual sign discrepancies are inevitable in blind equalization). A close inspection
reveals seven dierent Wiener optima, including the global
minimum. This property of diversity maintenance confirms
the capability of multimodal exploration, inherent to the immune network approach.
5.
DISCUSSION AND FUTURE TRENDS
This work started claiming that there is a strong relationship between the CM global optima and some of the Wiener
solutions so that such solutions can be attained by refining the CM minima using a simple DD technique. On the
other hand, the CM global optimum can be easily reached by
means of a blind search procedure, such as an EA. Therefore,
the combination of the CM criterion with an ecient global
search procedure gives rise to a framework to design optimal
Wiener filters. This is the core of our proposal.
Our approach uses an immune-based algorithm, named
opt-aiNet, to optimize the parameters of the equalizer, and
benchmarks its performance against those obtained by using a genetic algorithm with niching. Dierent channels and
REFERENCES
[1] D. N. Godard, Self-recovering equalization and carrier tracking in two-dimensional data communication systems, IEEE
Trans. Communications, vol. 28, no. 11, pp. 18671875, 1980.
[2] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle
River, NJ, USA, 3rd edition, 1996.
[3] C. R. Johnson, P. Schniter, T. J. Endres, J. D. Behm, D. R.
Brown, and R. A. Casas, Blind equalization using the constant modulus criterion: a review, Proceedings of the IEEE,
vol. 86, no. 10, pp. 19271950, 1998.
[4] H. Zeng, L. Tong, and C. R. Johnson, An analysis of constant
modulus receivers, IEEE Trans. Signal Processing, vol. 47, no.
11, pp. 29902999, 1999.
[5] L. N. de Castro and J. Timmis, An artificial immune network for multimodal function optimization, in Proc. IEEE
Congress of Evolutionary Computation (CEC 02), vol. 1, pp.
699704, Honolulu, Hawaii, USA, May 2002.
[6] A. M. Costa, R. R. F. Attux, and J. M. T. Romano, A new
method for blind channel identification with genetic algorithms, in Proc. IEEE International Telecommunications Symposium, Natal, Brazil, September 2002.
[7] L. N. de Castro and J. Timmis, Artificial Immune Systems:
a New Computational Intelligence Approach, Springer-Verlag,
London, UK, 2002.
[8] F. M. Burnet, The Clonal Selection Theory of Acquired Immunity, Cambridge University Press, Cambridge, UK, 1959.
[9] N. K. Jerne, Towards a network theory of the immune system, Ann. Immunol. (Inst. Pasteur), vol. 125C, pp. 373389,
1974.
[10] L. N. de Castro and F. J. Von Zuben, Data Mining: A Heuristic Approach, Chapter XII aiNet: an artificial immune network
for data analysis, pp. 231259, Idea Group Publishing, Hershey, Pa, USA, 2001.
[11] T. Back, D. B. Fogel, and Z. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators, Institute of Physics
Publishing (IOP), Bristol, UK, 2000.
[12] W. Atmar, Notes on the simulation of evolution, IEEE
Transactions on Neural Networks, vol. 5, no. 1, pp. 130148,
1994.
[13] J. H. Holland, Adaptation in Natural and Artificial Systems,
MIT Press, Cambridge, Mass, USA, 2nd edition, 1992.

[14] H.-P. Schwefel, Kybernetische evolution als strategie der experimentellen forschung in der stromungstechnik, Diploma
Thesis, Technical University of Berlin, Berlin, Germany,
March 1965.
[15] L. N. de Castro and F. J. Von Zuben, Learning and optimization using the clonal selection principle, IEEE Transaction on
Evolutionary Computation, vol. 6, no. 3, pp. 239251, 2002.
[16] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, Optimization by simulated annealing, Science, vol. 220, no. 4598, pp.
671680, 1983.
[17] J. Kennedy, R. Eberhart, and Y. Shi, Swarm Intelligence, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 2001.
Romis Ribeiro de Faissol Attux was born in
Goiania, Brazil, in 1978. He received the B.S.
and M.S. degrees, both in electrical engineering, from the State University of Campinas (Unicamp), Campinas, Brazil, in 1999
and 2001, respectively. Currently, he is a
doctorate student at the same institution.
His main research interests are blind equalization and identification, adaptive nonlinear filtering, evolutionary computation, and
dynamical systems.
Murilo Bellezoni Loiola was born in Sao
Paulo, Brazil, in 1979. In 2002, he received his B.S. degree in electrical engineering from the State University of Campinas
(Unicamp), Campinas, Brazil, where he is
currently an M.S. student. His main research interests include turbo equalization,
smart antennas, artificial neural networks,
and evolutionary algorithms.
Ricardo Suyama was born in Sao Paulo,
Brazil, in 1978. He received the B.S. degree
in electrical engineering from the State University of Campinas (Unicamp), Campinas,
Brazil, where he is currently an M.S. student. His research interests include adaptive equalization, adaptive nonlinear filtering, smart antennas, and evolutionary algorithms.
Leandro Nunes de Castro received the B.S.
degree in electrical engineering from the
Federal University of Goias, Goiania, Brazil,
in 1996, and M.S. degree in control engineering and Ph.D. degree in computer engineering from the State University of Campinas, Campinas, Sao Paulo, Brazil, in 1998
and 2001, respectively. He was a Research
Associate with the Computing Laboratory
at the University of Kent, Canterbury, UK
from 2001 to 2002, and is currently a Visiting Lecturer at the School
of Computer and Electrical Engineering at Unicamp. His research
interests include artificial immune systems, artificial neural networks, evolutionary algorithms, and artificial life. Dr. de Castro is
a member of the IEEE, and the SBA (Brazilian Society of Automation). He has been a referee for a number of conferences and journals related to computational intelligence, such as the Soft Computing Journal, IEEE Transactions on Evolutionary Computation,
and IEEE Transactions on Neural Networks.
747
Fernando Jose Von Zuben received his B.S.
degree in electrical engineering in 1991. In
1993, he received his M.S. degree and his
Ph.D. degree in 1996, both in automation,
from the Faculty of Electrical and Computer Engineering at the State University of
Campinas, SP, Brazil. Since 1997 he is an
Assistant Professor in the Department of
Computer Engineering and Industrial Automation, at the State University of Campinas, SP, Brazil. The main topics of his research are artificial neural networks, artificial immune systems, evolutionary algorithms,
nonlinear control systems, and multivariate data analysis. F. Von
Zuben is a member of IEEE, INNS, and AAAI.
Joao Marcos Travassos Romano was born
in Rio de Janeiro in 1960. He received the
B.S. and M.S. degrees in electrical engineering from the State University of Campinas
(Unicamp) in Brazil in 1981 and 1984, respectively. In 1987, he received the Ph.D. degree from University of Paris XI. In 1988, he
joined the Communications Department of
the Faculty of Electrical and Computer Engineering, Unicamp, where he is now a Professor. He served as an Invited Professor in the University Rene
Descartes in Paris during the winter of 1999, and in the Communications and Electronic Laboratory in CNAM, Paris, during the winter of 2002. He is the responsible of the Signal Processing for Communications Laboratory. His research interests concern adaptive
and intelligent signal processing and its applications in telecommunications problems like channel equalization and smart antennas.
Since 1988, he is a recipient of the Research Fellowship of CNPq,
Brazil. Professor Romano is a member of the IEEE Electronics and
Signal Processing Technical Committee and an IEEE senior member. From April 2000, he is the President of the Brazilian Communications Society (SBrT), a Sister Society of ComSoc, IEEE.

Evolutionary Computation for Sensor Planning:

The Task Distribution Plan
Enrique Dunn
Departamento de Electronica y Telecomunicaciones, Division de Fsica Aplicada, Centro de Investigacion Cientfica y
de Educacion Superior de Ensenada, 22860 Ensenada, BC, Mexico
Email: edunn@cicese.mx
Gustavo Olague
Departamento de Ciencias de la Computacion, Division de Fsica Aplicada, Centro de Investigacion Cientfica y
de Educacion Superior de Ensenada, 22860 Ensenada, BC, Mexico
Email: olague@cicese.mx
Received 29 June 2002 and in revised form 29 November 2002
Autonomous sensor planning is a problem of interest to scientists in the fields of computer vision, robotics, and photogrammetry. In automated visual tasks, a sensing planner must make complex and critical decisions involving sensor placement and the
sensing task specification. This paper addresses the problem of specifying sensing tasks for a multiple manipulator workcell given
an optimal sensor placement configuration. The problem is conceptually divided in two dierent phases: activity assignment and
tour planning. To solve such problems, an optimization methodology based on evolutionary computation is developed. Operational limitations originated from the workcell configuration are considered using specialized heuristics as well as a floating-point
representation based on the random keys approach. Experiments and performance results are presented.
Keywords and phrases: sensor planning, evolutionary computing, combinatorial optimization, random keys.
1.
INTRODUCTION
Sensor planning is a growing research area, which studies

the development of sensing strategies for computer vision
tasks [1]. The goal of such planning is to determine, as autonomously as possible, a group of sensing actions that lead
to the fulfillment of the vision task objectives. This is important because there are environments (i.e., dynamic environments with physical and temporal constraints) and tasks (i.e.,
scene exploration, highly accurate reconstruction) where the
specification of an adequate sensing strategy is not a trivial
endeavor. Moreover, an eective planner must make considerations that require complex spatial and temporal reasoning based on a set of mathematical models dependent of the
vision task goals [2]. Indeed, dicult numerical and combinatorial problems arise, presenting a rich variety of research
opportunities. Our approach is to state such problems in optimization terms and apply evolutionary computation (EC)
methodologies in their solution [3].
The problem of visual inspection of a complex threedimensional object requires the acquisition of multiple object images from dierent viewpoints [4]. Accordingly, to formulate a sensing strategy, an eective planner must consider
how the spatial distribution of viewpoints aects a specific
task goal, what an adequate configuration for an individual

sensor is, how the sensing actions will be executed. These
are the kind of general considerations that call for the use of
a flexible computing paradigm like EC. This work presents
the ongoing development of the EPOCA [5] sensor planning system, giving special attention to the task distribution
problem that emerges from a multiple manipulator workcell
[6].
The literature provides multiple examples of work dealing with automated sensing planning systems which consider
a manipulator using a camera-in-hand configuration. The
HEAVEN system developed by Sakane et al. [7] is an example in which the camera and light illumination placement
are studied. The MVP system developed by Abrams et al.
[8] considered the viewpoint planning of one manipulator
monitoring the movements of a second robot. The work developed by Triggs and Laugier [9] considers workspace constraints of a robot carrying a camera with the goal of automated inspection. More recently, Whaite and Ferrie [10]
developed an uncertainty based approach for autonomous
exploration using a manipulator robot. The next best view
problem for automated surface acquisition working with
a range scanner has been addressed by Pito [11]. Marchand and Chaumette [12] studied optimal camera motion in
Evolutionary Computation for Sensor Planning: The Task Distribution Plan

active vision systems for 3D reconstruction and exploration.
Ye and Tsotsos [13] developed a sensor planner system for 3D
object search applied in mobile robotics. However, none of
these systems have studied the problem of assigning and sequencing the best order of movements that a multiple robot
system needs to perform.
This paper is organized as follows. First, the problem
statement is given in Section 2. Then, our approach to the
task distribution problem using EC is presented in Section 3.
In this section, we address the aspects of search space reduction, solution representation, and search heuristics. Experimental results are presented next in order to demonstrate the
validity and usefulness of the solution. Finally, conclusions
and guidelines for future research are provided to end the
paper.
2.
749
Figure 1: Photogrammetric network simulation of four robots.
PROBLEM STATEMENT
The automation of visual inspection tasks can be achieved

with the use of manipulator robots, see Figure 1. However, the incorporation of such devices makes additional demands on a sensing planner. In this example, each camera is mounted on the robot hand with the goal of measuring the box on the table. Also, additional floating cameras represent a set of desired viewpoints. The sensing plan
must consider not only the constraints and objectives of the
particular visual task but also the operational restrictions
imposed by the workcell. Additionally, in the case where
multiple manipulators are equipped with digital cameras, a
problem of robot coordination needs to be resolved. More
precisely, sensing actions need to be distributed among the
various sensing stations, and an ecient task specification
for the entire workcell should be determined. The EPOCA
network design module can determine an optimal sensing
configuration for multiple cameras converging on a threedimensional object [14]. We use this configuration as input
for our task distribution problem in the proposed multiple
robot workcell. It is assumed that the robots move in straight
lines between dierent viewpoints and that each robot must
start and finish each tour from a predetermined configuration. In this way, the problem of specifying an ecient task
distribution for the manipulator robots consists of the following.
(1) Assigning to each of the robots a set of viewpoints from
which to obtain an image, see Figure 2. In other words,
determining how many and which viewpoints are to be
assigned to each robot.
(2) Deciding on an optimal tour for each of the robots, see
Figure 3. This involves specifying the correct order of
each viewpoint in a robots tour.
In this way, we have two of the most dicult combinatorial problems in computer science, which are the set partition and traveling salesman problems, see Figures 2 and 3
for the graphical interpretation of these problems. Actually,
our task distribution problem consists of a multiple traveling
salesman problem instance. The goal is to specify the optimal
Figure 2: Activity assignment. Each viewpoint is assigned to one of

the robots, forming dierent excluding sets.
Figure 3: Tour planning. Each of the sets is ordered, specifying the

tour to follow each of the robots.
combination of multiple subtours, with the requirement that

every viewpoint specified by the EPOCA network configuration module is visited. In order to describe our task distribution problem, the following definitions are given.
750
Definition 1 (Photogrammetric network). A photogrammetric network is represented as an ordered set V of n threedimensional viewpoints. Each individual viewpoint is expressed as V j , where j ranges from j = 1 to n.
Definition 2 (Robot workcell). A multirobot active vision
system is represented by an ordered set R consisting of r
robots in the workcell. Each individual robot is expressed by
Ri , where i ranges from i = 1 to r.
Definition 3 (Operational environment). Each robot has an
operational restricted physical space denoted by Oi , where i
ranges from i = 1 to r.
Accordingly, the problem statement can be expressed as
follows.
Definition 4 (Task distribution problem). Find a set of r ordered subsets Xi V, where V = {ri=1 Xi | V j Xi , V j
Oi } such that the total length traveled by the robots is minimized.
From the above definitions, the activity assignment problem relates each of the n elements of V with one of the
r possible elements of R. Considering that each robot Ri
has assigned ni viewpoints, a problem of sequencing the
viewpoints emerges, which we call tour planning. Our goal is
to find the best combination of activity assignment and tour
planning in order to optimize the overall operational cost
of the task distribution. This total operational cost is produced by adding individual tour costs, Qi , defined by the Euclidean distance that each robot needs to travel in straight
lines among the dierent
viewpoints. Hence, the criterion
is represented as QT = ri=1 Qi . Such a problem statement
yields a combinatorial problem which is computationally
NP-hard and requires the use of special heuristics in order
to avoid an exhaustive search.
3.
EC APPROACH TO TASK DISTRIBUTION
Our problem is presented as a combinatorial optimization

problem with a large search space. An optimization method
based on genetic algorithms is proposed. To obtain a quality
solution, three key aspects need to be addressed: search space
reduction, solution representation, and search heuristics. The
following sections present our approach to these key aspects
in order to develop a global optimization method to solve the
task distribution problem.
3.1. Search space reduction
Combinatorial problems generally have to satisfy a given set
of competing restrictions. In our task distribution problem,
some of these restrictions are straightforward; that is, each
viewpoint should be assigned to only one robot, each viewpoint should be visited only once inside a robot tour. On
the other hand, implicit restrictions, like the accessibility of a
robot to a particular viewpoint, need to be determined. Consideration of such restrictions can help reduce the size of the
search space. This is relevant because in practice a manip-
Figure 4: Operational restrictions. The workcell configuration imposes accessibility restrictions. Hence, when a robot reach is limited,
it is possible to reduce the search space for the activity assignment
phase.
Table 1: Structure ACCESSIBILITY containing the number and the
list of robots capable of reaching a particular viewpoint.
Viewpoint ID
Number of robots
List of robots IDs
V1
..
.
Vn
r1
..
.
rn
RobID1 , . . . , RobIDr1
..
.
RobID1 , . . . , RobIDrn
ulator has limited workspace, see Figure 4. The method by

which such restrictions are computed is presented next.
Assuming a static and obstacle-free environment, it is
reasonable to compute the robots accessibility for a given position and orientation by means of solving the robot inverse
kinematic problem. In this work, we consider the PUMA560
manipulator which consists of six degrees of freedom. A
three-dimensional computer graphics simulation environment was developed in order to visualize such accessibility
restrictions. Multiple manipulators were considered in our
computer simulation. The inverse kinematic problem was
solved for every robot at each viewpoint. The cases where a
robot could access a viewpoint were stored in an auxiliary
data structure called ACCESSIBILITY. This structure contains an entry for every viewpoint V j in order to keep a
record of how many and which robots are capable of reaching that particular viewpoint, see Table 1. Such values remain
constant throughout the course of task execution, therefore,
they only need to be computed once. The above method evaluates the restrictions imposed by the physical arrangement
of the workcell, as well as the robot revolute joint limitations.
Such operational restrictions are incorporated implicitly as
an intrinsic element of our optimization method.
3.2.
Solution representation
A representation similar to random keys [15] is proposed.

In this representation, each viewpoint V j is assigned a random value S j in the range (0, 1), allowing for the implementation of very straightforward genetic operators. These

S2
S1
S = 0.41
S3
0.51 0.15
Sn
0.79 0.63 0.96
0.84
Table 2: Structure TASKS containing the list of viewpoints comprising each robot tour Ti .
0.18
Figure 5: Solution encoding. Each of the n viewpoints is assigned a

random floating-point value Si in the range (0, 1). These values are
stored in a string S.
values are stored in a representation string denoted by S.

Since there are n dierent viewpoints, S will consist of n elements, see Figure 5. Random keys use a heuristic we call the
smallest-value-first heuristic. In our case, the viewpoint with
the smallest corresponding value in S would be the first viewpoint in a given permutation P. The viewpoint with the second smallest value in S would be the second viewpoint in P,
and so forth. In this way, the order of a viewpoint V j inside
a given permutation P depends on the magnitude of its corresponding value S j with respect to all the other values in S.
To illustrate, given five viewpoints, a possible representation
string can be
S = [0.89, 0.76, 0.54, 0.23, 0.62].
(1)
The smallest value in S is found at the fourth position, denoted by S4 . Therefore, V4 is the first viewpoint in the resulting permutation P. The second smallest value is found in the
third position S3 , making V3 the second viewpoint in P, and
so on. The resulting permutation of the five viewpoints is
P = V4 , V3 , V5 , V2 , V1 .
751
(2)
The random keys approach can be adapted to solve our

task distribution problem. The smallest-value-first heuristic avoids the generation of unfeasible solutions common to
permutation-based representations. Random keys representation also allows our optimization method to apply genetic
operators without the need for additional heuristics.
The convention of encoding a possible solution into a
string representation has been specified. The question of how
to describe the corresponding solution to such a representation is now considered. Recalling the problem statement,
initially, there is a set of n viewpoints V j , and each must be
assigned to one of the r possible robots. Using random keys
representation, a possible solution is codified into a string S
of n values. As stated in Section 2, we want to optimize the
total operational cost QT . However, the solution representation S needs to be decoded into an explicit description of the
task distribution problem. Such a description would represent each of the r robot tours. To accomplish this, an auxiliary data structure called TASKS is proposed to represent
the global task distribution among robots, see Table 2. This
structure has an entry Ti for each robot Ri , which describes
that robot tour; that is, Ti lists the sequence of viewpoints
assigned to that particular robot. Each of these Ti tours is
evaluated to obtain an individual tour cost Qi , from which
the total operational cost QT is obtained. The question before us now is how to convert a string representation into a
corresponding task distribution description. The following
Robot ID Number of viewpoints

R1
..
.
Rr
v1
..
.
vr
List of viewpoint IDs

T1 = [ViewID1 , . . . , ViewIDv1 ]
..
.
Tr = [ViewID1 , . . . , ViewIDvr ]
subsection presents the heuristics used by our method to obtain such task distribution description.
3.3.
Search heuristics
A solution representation S needs to be evaluated. Such evaluation is applied to the task distribution description contained in TASKS. Hence, a mapping M : S TASKS is
necessary. The mapping M assigns and sequences the viewpoints among the dierent robots and stores the results in
the structure TASKS. The mapping M makes use of the solution representation data structures S and TASKS, as well as
the precomputed operational restrictions stored in ACCESSIBILITY. The two distinct phases of activity assignment and
tour planning are presented separately.
3.3.1
Activity assignment
The activity assignment problem allocates each of the viewpoints V j to one of the possible robots. The goal is to provide
an initial unsequenced set of individual robot tours Ti using
the following steps.
Step 1. Obtain the r j number of robots capable of reaching
that particular viewpoint by consulting the ACCESSIBILITY structure, see Table 1.
Step 2. Divide the interval (0, 1) into r j equally distributed
segments in order to determine the size of a comparison segment Seg = 1/r j .
Step 3. Calculate in which k segment the random value S j resides, that is, k = Int(S j / Seg) + 1.
Step 4. Assign the viewpoint V j to the kth robot in the corresponding entry in the ACCESSIBILITY structure. In
this way, the assigned robot index i is given by RobIDk ,
which is found on the entry that corresponds to V j inside the ACCESSIBILITY table.
Step 5. Append V j to the list of viewpoints, Ti assigned to
the ith robot. The tour description Ti is stored in the
TASKS structure.
A graphical description of these heuristic steps is shown
in Figure 6. The series of actions performed in the activity
assignment phase are based on the compliance with operational restrictions, and in doing so, assure that any codified
string S brings a valid solution to the assignment problem.
Based on such strategy, each possible codification string S has
only one possible interpretation. After executing this series of
steps, each viewpoint is assigned to a robot. The viewpoints
assigned to a single robot Ri are grouped into a set Ti . Each
752
S=
S1
S2
S3
0.41
0.23
0.15
0.79
0.42
0.96
0.64
S2
S1
Sn
S3
S = 0.41 0.23 0.15
0.18
0.79
Sn
0.42 0.96
0.64
0.18
S1 = 0.41
ACCESSIBILITY
1/3
Viewpoint
Number of
robots
List of
robots
V1
R1 , R3 , R4
Vn
rn
RobID1 , . . . , RobIDrn
2/3
1/3
2/3
Mapping
S1 = 0.24
4
k=2
2
0
1/3
2/3
Figure 7: Mapping of the representation string values. Each of the

values contained in S is adjusted before applying the smallest-valuefirst heuristic to the values stored in TASKS.
Figure 6: Activity assignment heuristics. The diagram shows Steps

1 through 4, corresponding to the assignment phase.
TASKS
Ti represents a tour of viewpoints assigned to that particular

robot and these tours are stored in the structure TASKS. Until this point, the order of each viewpoint inside a given tour
has not been specified. This is the problem we approach next.
Robot ID
No. of views
List of viewpoints
R1
.
.
.
3
.
.
.
T1 = [V1 , V3 , V8 ]
.
.
.
Rm
rm
Tm = [ViewID1 , . . . , ViewIDrm ]
3.3.2 Tour planning

The tour planning problem consists of correctly sequencing
each of the r robot tours Ti stored in the structure TASKS.
These tours are initially obtained from the activity assignment phase presented above, in which every viewpoint V j is
assigned to one of the r possible robots Ri . The goal of the
tour planning phase is to minimize the total operational cost
QT . This situation is equivalent to solving r dierent traveling salesman problems. The smallest-value-first heuristic can
be applied to sequencing problems such as the one presented
here. Unfortunately, the rules by which the preceding assignments were made in Steps 1 through 4 produce undesirable
tendencies in the representation values S j that correspond to
each tour specification Ti . This is due to the deterministic
heuristic applied for robot assignment. As a consequence, the
values corresponding to the viewpoints contained in Ti will
be, on the average, higher than those corresponding to the
viewpoints in Ti1 and will create a bias inside each Ti when
directly applying the smallest-value-first heuristic. Therefore,
the values inside S need to be adjusted to eliminate such
unwanted properties. This is accomplished by the following
heuristic steps.
Step 6. Recall in which of the k possible segments of the range
(0, 1) lies the S j value used in the assignment phase.
Step 7. Calculate the value Sj in the range (0, 1) that reflects
the relative position of S j inside the kth segment. For
example, consider the value 0.70 which lies inside the
range (0.60, 0.80). This value lies exactly in the middle,
hence its corresponding value in the range (0, 1) is 0.5.
A graphic description of this heuristic is presented in
Figure 7.
S=
S1
S2
S3
0.24
0.73
0.04
0.34
0.77
S8
0.69
0.27
0.46
Applying the smallest-value-first heuristic the list

T1 is rearranged in the following manner
Robot ID
No. of views
List of viewpoints
R1
T1 = [V3 , V1 , V8 ]
Figure 8: Tour planning. The smallest-value-first heuristic is applied to each robot tour considering the previously adjusted values
in S.
Step 8. Update S j to store the new value Sj .

Step 9. Apply the smallest-value-first heuristic to each of the
unordered robot tours Ti using the values stored in S ,
see Figure 8.
These series of steps ensure an unbiased tour sequencing, hence, empowering the search algorithm to more eectively seek out a global optima from a very large and complex
search space.
4.
EXPERIMENTATION AND RESULTS
The solution presented in the previous sections for the task

distribution problem was incorporated into an extension of
the functionality of the EPOCA system developed by Olague
753
[5]. EPOCA solves the photogrammetric network design

problem for complex objects. The problem of task distribution emerges as a result of the photogrammetric network
design performed by EPOCA. The system can be classified
as an EC-based system that addresses the complex goal of
automating the planning of sensing strategies for accurate
three-dimensional reconstruction.
Two dierent experiments are presented next: the first is
a simple scenario intended to illustrate our methods functionality; the second experiment is somewhat more complex
and its goal is to show the eectiveness and flexibility of our
system.
4.1. Experiment A
This experiment consists of eight viewpoints to be distributed among four manipulators. The viewpoints are
stacked into four pairs, each pair arranged beneath one of
the robots initial position, see Figure 9. The optimal task distribution for this example can be obtained using a greedy
heuristic. Hence, such an experiment might seem trivial, but
it will exemplify our methods functionality.
Operational restrictions are computed first, with the
goal of determining which robots can access a particular
viewpoint. As mentioned in Section 3, to compute such restrictions, the inverse kinematic problem is solved for every robot at each viewpoint. The results of such validations
are stored in the structure ACCESSIBILITY. The physical
arrangement of the robots for Experiment A is such that
every camera can be reached by three dierent robots, see
Table 3.
The genetic algorithm works with a population of codified strings, selecting the best individuals for reproduction.
Such reproduction process combines the characteristics of
two selected parent solutions and provides two new ospring
solutions which, in turn, will be part of the next generation
of solutions. This process is repeated in an iterative manner
until a certain number of generations is executed. At the end
of this iterative process, we obtain a set of possible solutions.
One of those individuals, which represented the optimal solution, was given by the following random keys representation:
S = [0.72, 0.71, 0.32, 0.14, 0.81, 0.80, 0.27, 0.07].
(3)
After the assignment heuristic, we determine in which of

the k segments each element S j resides. For the first viewpoint V1 , there are three possible robots to be assigned, see
Table 3; hence, the comparison segment Seg = 1/3 = 0.33.
In this way, following Steps 1 through 5, the corresponding
representation value S1 = 0.72 is determined to be in the
third segment, which is delimited by (0.66, 1.00). Therefore,
the robot to be assigned is the third robot on V1 s entry on
the structure ACCESSIBILITY, in this case RobID = 3. The
corresponding robot to be assigned to each viewpoint V j is
given by
Robot = R3 , R3 , R1 , R1 , R4 , R4 , R2 , R2 .
(4)
Figure 9: Eight viewpoints are to be distributed among four manipulators. Viewpoints are depicted as individual cameras and solid
lines connected such cameras illustrate each robot tour corresponding to an optimal task distribution.
Table 3: ACCESSIBILITY restrictions calculated for Experiment A,

depicted in Figure 9.
Viewpoint ID
Number of robots
List of robots IDs
V1
V2
r1 = 3
r2 = 3
R1 R2 R3
R1 R2 R3
V3
V4
V5
r3 = 3
r4 = 3
r5 = 3
R1 R3 R4
R1 R3 R4
R1 R2 R4
V6
V7
r6 = 3
r7 = 3
R1 R2 R4
R2 R3 R4
V8
r8 = 3
R2 R3 R4
At this point, we have an appropriately assigned set of

viewpoints. The values contained in S will now be adjusted
in accordance with Steps 5 through 9 so that the smallestvalue-first heuristic can be applied to the viewpoints assigned
to each robot. For the first viewpoint, its corresponding value
S1 is adjusted as follows. Recall that S1 = 0.72 resides on the
third segment which is delimited by (0.66, 1.00). The corresponding value of 0.72 on the range (0, 1) with respect to the
third segment just mentioned is given by the value 0.18. Applying these steps to every value in S yields
S = [0.18, 0.15, 0.96, 0.42, 0.45, 0.42, 0.81, 0.21].
(5)
Once the values in S have been adjusted, applying the

smallest-value-first heuristic rearranges TASKS as shown in
Table 4.
Twenty trials were executed and this global minimum
distribution was reached in every single execution in an average of 15.1 generations.
4.2.
Experiment B
This experiment presents a complex planar object which is

measured by four manipulators. The goal is to distribute the
754
Table 4: TASKS for an optimal solution in Experiment A after the

tour planning phase.
Robot ID
Number of viewpoints
List of viewpoint IDs
T1 = [V4 V3 ]
2
3
2
2
T2 = [V8 V7 ]
T3 = [V2 V1 ]
T4 = [V6 V5 ]
Figure 12: Another solution found by the system that corresponds

to the configuration shown in Figure 10.
Table 5: ACCESSIBILITY restrictions calculated for Experiment B,

depicted in Figure 10.
Figure 10: Thirteen viewpoints are to be distributed among four

manipulators. Viewpoints are depicted as individual cameras.
Figure 11: Best solution found by the genetic algorithm for the configuration shown in Figure 10.
photogrammetric network consisting of 13 cameras in an optimal manner, see Figure 10. Working with this fixed configuration, we executed several tests. First, to test our methods
functionality, we executed the task distribution planner. Several possible solutions are obtained over the course of multiple executions, two of such solutions are depicted in Figures
11 and 12. Notice that the best solution found, represented
in Figure 11, does not incorporate all of the available robots.
Figure 12 shows a more typical solution which is also found
by our system.
In order to test the methods adaptability, two of the four
manipulator robots were disabled. This additional restriction
is reflected only on changes to the values stored in Table 5.
Viewpoint ID
Number of robots
List of robots IDs
V1
V2
V3
r1 = 2
r2 = 2
r3 = 2
R2 , R4
R2 , R3
R1 , R4
V4
V5
r4 = 2
r5 = 2
R1 , R4
R1 , R4
V6
V7
V8
r6 = 2
r7 = 2
r8 = 2
R2 , R3
R2 , R4
R2 , R3
V9
V10
r9 = 2
r10 = 2
R1 , R3
R1 , R3
V11
V12
V13
r11 = 3
r12 = 3
r13 = 3
R1 , R 2 , R 3
R1 , R 2 , R 4
R1 , R 2 , R 4
The system is expected to distribute tasks among the two remaining robots. Results from such tests are shown in Figures
13 and 14. In these cases the activity assignment problem becomes visually more simple to resolve, but the diculty of
the tour planning problem becomes more evident since each
tour will consist of more viewpoints.
Since our approach is based on EC techniques, the determination of the task distribution plan is the product of
the evolution process over a population of possible solutions.
Therefore, fitness values of each of these individuals, and of
the population in general, reflect the eect of such evolution. In this way, the population fitness values evolve over
the course of several generations until an optimal solution
is found, see Figure 15. The stepwise decrements in the best
fitness line point out the combinatorial aspect of our search,
while the average fitness confirms the positive eect of the
evolution process.
While great detail has been given to the special heuristics
used in our approach, the behavior of the curves presented in
755
Operational cost in mm.
1960
1920
1840
1800
Operational cost in mm.
Figure 14: An environment similar to Figure 13 showing the systems flexibility to changes in the workcell configuration.
Exhaustive
search
1760
1720
Figure 13: Solution found by the system for the case where a pair
of robots were disabled from the configuration shown in Figure 10.
Gready
search
1880
10
20
30
Execution number
40
50
Figure 16: Genetic algorithm performance over multiple executions. The obtained solutions are always better than a greedy search,
reaching the global optima 14 out of 50 times.
An appreciation of the eectiveness of the proposed

methodology is obtained from the comparison of its solutions against those oered by alternative methodologies. The
proposed methodology is compared to an exhaustive search
and a greedy heuristic. The results for the fixed configuration shown in Figure 10 are presented in Figure 16. As the
figure illustrates, our algorithm consistently outperforms a
greedy heuristic in terms of the quality of the proposed solutions. The advantage obtained with the genetic algorithm approach refers to the computational cost; considering the EC
algorithm requires about 3 seconds against 14 hours for an
exhaustive search. On the other hand, our approach reaches
a global optima 28% of the time over the course of 50 executions, coming within an average of 2.9% to global optima. As
these results reflect, there is an obvious compromise between
solution quality and computational eciency.
5500
5.
5000
The development of an eective sensor planner for automated vision tasks implies the consideration of operational
restrictions as well as the vision tasks objectives. This work
presents a solution for the task distribution problem inherent to multiple robot workcells. The problem is conceptualized as two separate combinatorial problems: activity assignment and tour planning. A genetic algorithm-based strategy
that concurrently solves these problems was presented along
with experimental results. The approach employs auxiliary
data structures in order to incorporate accessibility limitations and to specify a task distribution plan. The evolutionary
nature of the optimization method allows for multiple approximate solutions of the optimization problem to be found
over the course of several executions. Performance considerations support the use of the proposed methodology compared to a greedy heuristic or an exhaustive search.
Future work can consider the robot motion planning
problem presented when there are obstacles in the environment or when the manipulator can collide with each other.
Also, the representation scheme can be modified to use two
values instead of adjusting the original representation string
by heuristic means. Furthermore, the genetic operators can
4500
4000
Worse
fitness
3500
3000
Average
fitness
2500
2000
Best
fitness
1500
1000
20
40
60
Generation
80
100
120
Figure 15: Population fitness over the evolution process.
Figure 15 and the overall performance depend on the genetic

algorithm operational parameters. A single point crossover
operator, subject to a probability Pc = 0.95, was utilized.
Furthermore, the mutation operator consisting of an additive value obeying a normal distribution N(0, 0.2) for each
of the elements in the representation string was also applied
according to a probability Pm = 0.001.
CONCLUSIONS AND FUTURE WORK
756
be modified in search of improving the evolutionary algorithm performance. Also, a rigorous analysis of the properties of the heuristics used is needed. At present, we are working toward a real implementation of our algorithms for intelligent sensor planning.
ACKNOWLEDGMENTS
This research was founded by Contract 35267-A from
CONACyT and under the LAFMI Project. The first author
was supported by scholarship 142987 from CONACyT. Figures 1, 2, 3, 4, 9, 10, 11, 12, 13, and 14 were generated with
software written at the Geometry Center. The authors thank
the anonymous reviewers for their suggestions which greatly
helped improve this paper.
REFERENCES
[1] K. A. Tarabanis, P. K. Allen, and R. Y. Tsai, A survey of sensor
planning in computer vision, IEEE Transactions on Robotics
and Automation, vol. 11, no. 1, pp. 86104, 1995.
[2] J. Miura and K. Ikeuchi, Task-oriented generation of visual
sensing strategies in assembly tasks, IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 20, no. 2, pp. 126138,
1998.
[3] G. Olague and R. Mohr, Optimal camera placement for accurate reconstruction, Pattern Recognition, vol. 35, no. 4, pp.
927944, 2002.
[4] T. S. Newman and A. K. Jain, A survey of automated visual
inspection, Computer Vision and Image Understanding, vol.
61, no. 2, pp. 231262, 1995.
[5] G. Olague, Planification du placement de cameras pour des
mesures 3D de precision, Ph.D. thesis, Institut National Polytechnique de Grenoble, France, October 1998.
[6] G. Olague and E. Dunn, Multiple robot task distribution:
Towards an autonomous photogrammetric system, in Proc.
IEEE Systems, Man and Cybernetics Conference, vol. 5, pp.
32353240, Tucson, Ariz, USA, October 2001.
[7] S. Sakane, R. Niepold, T. Sato, and Y. Shirai, Illumination
setup planning for a hand-eye system based on an environmental model, Advanced Robotics, vol. 6, no. 4, pp. 461482,
1992.
[8] S. Abrams, P. K. Allen, and K. A. Tarabanis, Dynamic sensor
planning, in Proc. IEEE International Conf. on Robotics and
Automation, Atlanta, Ga, USA, May 1993.
[9] B. Triggs and C. Laugier, Automatic task planning for robot
vision, in Proc. Int. Symp. Robotics Research, Munich, October
1995.
[10] P. Whaite and F. P. Ferrie, Autonomous exploration: Driven
by uncertainty, IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 19, no. 3, pp. 193205, 1997.
[11] R. Pito, A solution to the next best view problem for automated surface acquisition, IEEE Trans. on Pattern Analysis
and Machine Intelligence, vol. 21, no. 10, pp. 10161030, 1999.
[12] E. Marchand and F. Chaumette, Active vision for complete
scene reconstruction and exploration, IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 21, no. 1, pp. 6572,
1999.
[13] Y. Ye and J. K. Tsotsos, Sensor planning for 3D object search,
Computer Vision and Image Understanding, vol. 73, no. 2, pp.
145168, 1999.
[14] G. Olague, Automated photogrammetric network design using genetic algorithms, Photogrammetric Engineering & Remote Sensing, vol. 68, no. 5, pp. 423431, 2002, Paper awarded

the 2003 First Honorable Mention for the Talbert Abrams
Award, by ASPRS.
[15] J. C. Bean, Genetic algorithms and random keys for sequencing and optimization, ORSA Journal on Computing, vol. 6,
no. 2, pp. 154160, 1994.
Enrique Dunn received a computer engineering degree from Universidad Au
tonoma
de Baja California, in 1999. He obtained the M.S. degree in computer science
from CICESE, Mexico, in 2001. Currently,
Dunn is working towards the Ph.D. degree
at the Electronics and Telecommunications
Department, Applied Physics Division, CICESE, Mexico. His research interests include robotics, combinatorial optimization,
evolutionary computation, close range photogrammetry, and 3D
simulation. He is a student member of the ASPRS.
Gustavo Olague holds a Bachelors degree
(Honors) in Electronics Engineering and a
Masters degree in computer science from
the Instituto Tecnologico

de Chihuahua,
Mexico, in 1992 and 1995, respectively. He
received the Diplome de Doctorat en Imagerie, Vision et Robotique (Ph.D.) from
Institut National Polytechnique de Grenoble, France, in 1998. From 1999 to 2001, he
was an Associate Professor of computer science and in 2002, he was promoted to Professor of the Applied
Physics Division at CICESE, Mexico. Dr. Olague is a member of
the ASPRS, ISGEC, IEEE, IEEE Computer Society, IEEE Robotics
and Automation, IEEE SMC and RSPSoc. Dr. Olague has served on
numerous Technical Committees and has been invited to lecture at
universities in France, Spain, and Colombia. He has served as Chair
and Cochair at numerous international conferences like the ASPRS
2001 and 2003 during the Close-Range Photogrammetry session
and the IEEE SMC 2001 Robotics session. He also had visiting appointments at the Technische Universitat Clausthal, Germany and
the LAAS, France. His research interests include robotics, computer
vision, and, in particular, the coupling of evolutionary computation in those two research domains (autonomous systems and visual perception). Dr. Olague is recipient of the 2003 First Honorable Mention for the Talbert Abrams Award.

An Evolutionary Approach for Joint Blind Multichannel

Estimation and Order Detection
Chen Fangjiong
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Department of Electronic Engineering, South China University of Technology, Wushan, Guangzhou 510641, China
Email: eefjchen@scut.edu.cn
Sam Kwong
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Email: cssamk@cityu.edu.hk
Wei Gang
Department of Electronic Engineering, South China University of Technology, Wushan, Guangzhou 510641, China
Email: ecgwei@scut.edu.cn
Received 30 May 2001 and in revised form 28 January 2003
A joint blind order-detection and parameter-estimation algorithm for a single-input multiple-output (SIMO) channel is presented. Based on the subspace decomposition of the channel output, an objective function including channel order and channel
parameters is proposed. The problem is resolved by using a specifically designed genetic algorithm (GA). In the proposed GA,
we encode both the channel order and parameters into a single chromosome, so they can be estimated simultaneously. Novel GA
operators and convergence criteria are used to guarantee correct and high convergence speed. Simulation results show that the
proposed GA achieves satisfactory convergence speed and performance.
Keywords and phrases: genetic algorithms, SIMO, blind signal identification.
1.
INTRODUCTION
Many applications in signal processing encounter the problem of blind multichannel identification. Traditional methods of such identification usually apply higher-order statistics techniques. The major problems of these methods are
slow convergence and many local optima [1]. Since the original work of Tong et al. [1, 2], many lower-order statisticsbased methods have been proposed for blind multichannel
identification (see [3] and references therein). A common
assumption in these methods is that the channel order is
known in advance. However, such information is, in fact,
not available. Thus, we are obliged to estimate the channel
order beforehand. Though many order-detection algorithms
can be applied (e.g., see [4]) to solve this particular problem,
the approaches that separate order detection and parameter
estimation may not be ecient, especially when the channelimpulse response has small head and tail taps [5].
To tackle this drawback, a class of channel-estimation algorithms performing joint order detection and parameter estimation has been proposed [5, 6]. In [5], a cost function in-
cluding channel order and parameters is proposed. However,

the algorithm may not be ecient because the channel order
is estimated by evaluating all the possible candidates from 1
to a predefined ceiling. The method proposed in [6] is also
not a real joint approach since the order was separately estimated by detecting the rank of an overmodelled data matrix.
In fact, this is very similar to the methods that applied a rankdetection procedure to an overmodelled data covariance matrix in [4]. Order estimation via rank detection may not be
ecient because it is sensitive to noise [4] and the calculation
of eigenvalue decomposition is also computationally costly.
In this paper, we propose a real joint order-detection
and channel-estimation method based on genetic algorithm
(GA). The GAs have been widely used in channel-parameter
estimation [7, 8, 9]. However, its application to joint order
detection and parameter estimation has not been well explored. Based on the subspace decomposition of the outputautocorrelation matrix, we first develop a new objective function for estimating channel order and parameters. Then, a
novel GA-based technique is presented to resolve this problem. The key proposition of the proposed GA is that the
758
channel order can be encoded as part of the chromosome.

Consequently, the channel order and parameters can be simultaneously estimated. Simulation results show that the
new GA outperforms existing GAs in convergence speed. We
also compare the performance of the proposed GA with the
closed-form subspace method which assumes that the channel order is known [10]. Simulation results show that the
proposed GA achieves a similar performance.
2.
PROBLEM FORMULATION
We consider a multichannel FIR system with M subchannels. The transmitted discrete signal s(n) is modulated, filtered, and transmitted over these Gaussian subchannels. The
received signals are filtered and down-band converted. The
resulting baseband signal at the mth sensor can be expressed
as follows [1]:
xm (n) =
hm (k)s(n k) + bm (n),
m = 1, . . . , M,
(1)
k=0
where bm (n) denotes the additive Gaussian noise and is assumed to be uncorrelated with the input signal s(n), hm (n) is
the equivalent discrete channel-impulse response associated
with the mth sensor, and L is the largest order of these subchannels (note that the subchannels may have dierent orders). Equation (1) can be represented in vector-matrix formulation as follows:
xm (n) = Hm s(n) + bm (n),
m = 1, . . . , M,
(2)
where H = [HT1 HTM ]T is the M(N +1)(N +L+1) overall system transfer matrix and b(n) = [bT1 (n) bTM (n)]T
is the M(N + 1) 1 additive noise vector.
If we define the output-autocorrelation matrix as Rxx =
E[x(n)x(n)T ], then we have
Rxx = HRss HT + Rbb ,
where Rss = E[s(n)s(n)T ] is the (N + L + 1) (N + L + 1)

autocorrelation matrix of s(n) and Rbb = E[b(n)b(n)T ] is
the MN MN autocorrelation matrix of b(n). In the following, we will present an objective function based on the subspace decomposition of Rxx . To exploit the subspace properties, the following assumptions must be made [10]: the
parameter matrix H has full column rank, which implies
M(N + 1) (N + L + 1) and the subchannels do not share
common zeros. The autocorrelation matrix Rss has full rank.
The basic idea of subspace decomposition is to decompose the Rxx into a signal subspace and a noise subspace. Let
1 2 M(N+1) be the eigenvalues of Rxx ; since H
has full column rank (N + L + 1) and Rss has full rank, it implies that the signal component of Rxx , that is, HRss HH , has
rank of N + L + 1. Therefore,
i > n2
i =
for i = 1, . . . , N + L + 1,
n2
(3)
is the (N + 1) 1 observed vector at the mth sensor,

bm (n) = bm (n) bm (n 1) bm (n N)
(4)
is the (N + 1) 1 additive noise vector, and

s(n) = s(n) s(n 1) s(n L N)
(5)
is the (N + L + 1) 1 transmitted vector. The matrix
hm,0 hm,L 0
..
..
..
..
.
.
.
.
0 hm,0 hm,L
.
Hm =
..
where n2 denotes the variance of the additive Gaussian noise.

If we perform the subspace decomposition of Rxx , we get
H
Rxx = U U = Us Un
is the (N + 1) (N + L + 1) transfer matrix of subchannel

hm (n).
We define an M(N + 1) 1 overall observation vector as
T
(n)]T , then the multichannel system
x(n) = [x1T (n) xM
can be represented in matrix formulation as
x(n) = Hs(n) + b(n),
(7)
Us Un
(10)
where s = diag{1 , . . . , N+L+1 } contains N + L + 1 largest

eigenvalues of Rxx in descending order and the columns
of Us are the corresponding orthogonal eigenvectors of
1 , . . . , N+L+1 , and n = diag{N+L+2 , . . . , M(N+1) } contains
the other eigenvalues and the columns of Un are the orthogonal eigenvectors corresponding to eigenvalue n2 . The spans
of Us and Un denote the signal subspace and the noise subspace, respectively. The key proposal is that the columns of H
also span the signal subspace of Rxx . The channel parameters
can then be uniquely identified by the orthogonal property
between the signal subspace and the noise subspace [10], that
is,
HH Un = 0.
(6)
(9)
for i = N + L + 2, . . . , M(N + 1),
where
xm (n) = xm (n) xm (n 1) xm (n N)
(8)
(11)
Let h = [h1,0 h1,L hM,0 hM,L ]T contain

all the channel parameters. From (11), we propose an objective function as follows:

J(h) = HH Un .
(12)
In this objective function, the channel order is assumed

to be known. However, in practice this is not true. Therefore, the channel order must be estimated beforehand. In this
paper, we estimate the channel order based on (12). Since
An Evolutionary Approach for Blind Channel Estimation

the subchannels may have dierent orders, order estimation
refers to the largest. Note that the channel identifiability does
not depend on whether the subchannels have the same order but on whether they have common zeros [10]. We show
that order estimation aects the number of global optima in
(12). It shows that J(h) has only one nonzero optimum when
the channel order is correctly estimated [10]. We study the
cases where the channel order is either under- or overestimated based on (12).
If the channel order is overestimated, then J(h) will have
more than one nonzero optimum. For instance, let the estimated order be L + 1; we define

T
= 0 hm,0 hm,L ,
T
T
0 = hm,0 hm,L 0 .
h1m = 0 hTm
h2m
hTm
(13)
By constructing H1 , H2 from h1m , h2m , one can verify that H1 ,

H2 will satisfy the following condition:
UTn H1 = UTn H2 = 0.
(14)
This means that J(h) will have two linear independent

nonzero optima:

h1 = h11
h2 = h21
h1M
h2M
T T
T
(15)
It is straightforward to show that if the channel order is

underestimated, then J(h) has no nonzero optimum. If this
is not true, from the above derivation, J(h) with correctly
estimated order will have more than one nonzero solution.
This contradicts the conclusion in [10].
Therefore, we can conclude that the optima of J(h) satisfy
the following conditions: optima of J(h) are
(i) more than one nonzero optimum overestimated order,
(ii) only one nonzero optimum correctly estimated order,
(iii) no nonzero optimum underestimated order.
Now let l denote the estimated order. Assuming that the
channel order is unknown, we propose to include l in the objective function of (12) and propose a new objective function
J(l, h) =
HH Un
. In order to let l converge on the correct
order, the following conditions must be met:
(1) trivial solution, that is, h = 0, must be avoided,
(2) l is more likely to converge to a small order.
is a solution of
Note that h has a free constant scale. If h

(11), then h, where is an arbitrary constant, is also a solution of (11). A common technique to avoid a trivial solution
is to normalize h to
h
= 1 [5, 6, 10]. In this paper, we extend this constraint by proposing
h
1, and concentrate
on a special case. That is, we fix the first parameter of h to
h(1) = 1. Such a constraint is helpful in avoiding the computation of normalization during iteration. Note that l will
aect the objective value by using the number of elements
in h to compute it. A smaller l implies that fewer elements
are used. Consequently, it may result in a smaller objective
759
value. Therefore, such a constraint is also helpful in making l
converge to a smaller value.
To ensure condition (2), we suggest imposing a penalty
on J(l, h) when a larger estimate of channel order is achieved.
Practically, the objective value (J(l, h)) converges to a small
value rather than exact zero. Therefore, we apply the multiplication instead of addition. The following objective function is proposed:

J(l, h) = lK UH
nH ,
(16)
where K scales the penalty and it must be guaranteed that

K 0.
3.
GENETIC ALGORITHM
A GA is a random search algorithm that mimics the process

of biological evolution. The algorithm begins with a collection of parameter estimates (called a chromosome) and each
is evaluated for its fitness for solving a given optimization
task. In each generation, the fittest chromosomes are allowed
to mate, mutate, and give birth to ospring. These children
form the basis of the new generation. Since the children generation always contains the elite of the parents generation,
a newborn generation tends to be closer to a solution to the
optimization problem. After a few evolutions, workable solutions can be achieved if some convergence criteria are satisfied. In fact, a GA is a very flexible tool and is usually adapted
to the given optimization problem. The features of the proposed GA are described as below.
Encoding
Each chromosome has two parts. One represents the channel
order and is encoded in binary and the other represents the
channel parameters and is encoded in real value. Let (c, h)ij
( j = 1, . . . , Q) denote the jth chromosome of the ith generation where Q is the population size. The chromosome structure is as follows:
c1 c2 cS
h1 h2 hT
binary-encoded order genes
real value-encoded parameter genes

(17)
where the parameter chromosomes have the same structure

as h. Note that the length of order chromosomes decides the
length of parameter chromosomes and one should ensure
that the length of parameter chromosomes is greater than the
possible channel order.
Initialization
Normally, the initial values of the chromosomes are randomly assigned. In the proposed GA, in order to prevent the
algorithm from converging to a trivial solution, as we have
shown in Section 2, the first parameter of h (i.e., the first gene
of parameter chromosomes) is fixed to h1 = 1, where other
genes are randomly initialized.
Fitness function
In the proposed GA, tournament selection is adopted, in
which the objective values are obtained by computing the
760
value in (16). Consequently, it is not necessary to map the

objective value to fitness value. Since the order chromosomes
have a very simple coding (in binary) and a smaller gene
pool, order chromosomes are expected to converge much
faster than the parameter chromosomes. Thus, we propose
to detect the convergence of order chromosomes and parameter chromosomes separately. However, it should be noted
that the objective values of (16) cannot directly indicate the
fitness of the order chromosomes. The fitness function for
order chromosomes is required and is defined as follows. The
fitness of an estimated order l is measured as the number of
chromosomes whose order is equal to l. The order fitness of
(c, h)ij is denoted as
f cij
cumij (l).
(18)
The above fitness function is not used in tournament selection but only in the convergence criteria of order chromosomes.
Parent selection
A good parent selection mechanism gives better parents a
better chance to reproduce. In the proposed GA, we employ
an elitist method [8] and tournament selection [11]. First,
partial chromosomes of the present population, that is, the
Q best chromosomes, are directly selected. Then, the other
(1 ) Q child chromosomes are generated via tournament
selection within the whole parent population. That is, two
chromosomes are randomly selected from the parents population in each cycle. The one with the smaller objective value
is selected.
Crossover
Crossover combines the feature of two parent chromosomes
to form two child chromosomes. Generally, the parent chromosomes are mated randomly [12]. In the proposed GA,
each chromosome contains two parts with dierent coding
technique. The order chromosome will decide how many elements in the parameter chromosome are used to calculate
the objective value. Therefore, these two parts cannot be decoupled. The conventional methods that perform crossover
separately may not be ecient. Normally, the order chromosomes will be short. For instance, an order chromosome with
a length of 5 implies a searching space from 1 to 32, which
covers most practical cases of the FIR channels. Therefore,
the order chromosomes are expected to converge much faster
than the parameter chromosomes. We propose not to perform crossover on the order chromosomes but to use mutation only. For the parameter chromosomes, crossover between chromosomes with dierent order is more explorative
(i.e., searches more data space). However, it may also damage the building blocks in the parent chromosomes. On the
other hand, crossover between chromosomes with the same
order is more exploitative (i.e., it speeds up convergence).
However it may cause premature convergence. Since faster
convergence is preferable in blind channel identification, we
propose to mate chromosomes of the same order. For each
estimated order, if the number of corresponding chromosomes is odd, a randomly selected chromosome is added to
the mating pool.
Assume that the chromosomes are mated and a pair of
them is given as
i
i
(c, h)ij = c1 c2 cS , h1 h2 hT j ,
(19)
(c, h)ik = c1 c2 cS , h1 h2 hT k .
Let a1 , a2 [1, T] be two random integers (a1 < a2 ), and let

a1 +1 , . . . , a2 be a2 a1 random real numbers in (0, 1), then
the parameter parts of the child chromosomes are defined as

i
i
i
hi+1
j = h1, j ha1 , j , a1 +1 ha1 +1, j
+ 1 a1 +1 hia1 +1,k a2 hia2 , j

+ 1 a2 hia2 ,k , hia2 +1, j hiT, j ,
i
i
i
hi+1
k = h1,k ha1 ,k , a1 +1 ha1 +1,k
(20)
+ 1 a1 +1 hia1 +1, j a2 hia2 ,k

+ 1 a2 hia2 , j , hia2 +1,k hiT,k ,

where a two-point crossover is adopted.
Mutation
A mutation feature is introduced to prevent premature convergence. Originally, mutation was designed only for binaryrepresented chromosomes. For real value chromosomes, the
following random mutation is now widely adopted [12]:
g = g + (, ),
(21)
where g is the real value gene, is a random function which

may be Gaussian or uniform, and and are the related
mean and variance. In this paper, we use normal mutation
for the order genes. That is, we randomly alter the genes from
0 to 1 or from 1 to 0 with probability Pm . Normally, Pm is a
small number. However, in the proposed GA, the value of the
order chromosome decides the used parameter genes for calculating the objective function. Less value of order means a
lesser number of parameter genes and consequently less objective value. Therefore, in the start-up period of the iteration, the order chromosomes are more likely to converge on
a small value where order is equal to 1. A large mutation rate
is adopted to prevent such premature convergence.
For the parameter part, a uniform PDF is employed.
Let a3 , a4 [1, T] be two random integers (a3 < a4 ), and
let a3 +1 , . . . , a4 be a4 a3 random real numbers between
(1, 1), then the parameter chromosomes of the child generation are defined as
hi+1
j = h1 , . . . , ha3 , ha3 +1 + a3 +1 /P, . . . , ha4
+ a4 /P, . . . , ha4 +1 , . . . , hT ,
(22)
where P is a predefined number and can be adjusted during

iteration to speed up the convergence.
761
Table 1: The GA configuration.

Population size
48
The length of order chromosomes
The length of parameter chromosomes
16
Penalty scale
Elite selection ratio
1/12
Mutation rate of order chromosome
pm
0.5
Mutation scale of parameter chromosomes
10.2m/100
Control parameters of the convergence criteria
Convergence criterion
We propose a dierent convergence criterion for order chromosomes and parameter chromosomes. The order chromosomes are considered to be converged if the gene pool is dominated by a certain order, that is,
cumij
D
cumij (l)
cumij
D
J(c, h)s1
30
0.1
0.1
( + 1)/( 1)
( 1)/( + 1)
(23)
other orders
where lD is the dominant order, cumij (lD ) is the number

of chromosomes with order lD , and is a predefined ratio.
When the order chromosomes are converged, the mutation
rate of order chromosomes is set to zero (pm = 0). The parameter chromosomes are considered to be converged if the
change in the smallest objective value within X generations
is small, that is,

J(c, h)i J(c, h)iX < eJ(c, h)i ,
(24)
where e is also a predefined ratio. Theoretically, the objective function in (16) has multiple minima that may have
overestimated orders. In order to cause the order chromosomes to converge on the correct channel order, we impose a
penalty on the chromosomes with greater order. Due to the
random nature of a GA, though in most cases the order
chromosomes can converge on the real channel order (see
the simulation result in Table 1), there is no guarantee that
the chromosomes will absolutely converge on the real channel order. Therefore, we propose to examine the converged
result to ensure correct convergence. If we let (c, h)s1 be the
current converged result, the examination can be carried out
as follows (see the outer loop in Figure 2): reduce the order of (c, h)s1 by 1, fix the order, and run the proposed GA
again (note that this time the order chromosomes are fixed,
i.e., pm = 0). After a few generations, a new result denoted
as (c, h)s2 can be achieved. If the objective values of (c, h)s1
and (c, h)s2 , that is, J(c, h)s1 and J(c, h)s2 , are close enough,
then we can decide that J(c, h)s1 has overestimated order and
J(c, h)s2
Figure 1: Decision region for outer loop criterion.
reexamine J(c, h)s2 using the same strategy. Otherwise, if the

drop from J(c, h)s1 to J(c, h)s2 is significantly large, the following inequality arises:

J(c, h)s1 J(c, h)s2 > J(c, h)s1 + J(c, h)s2 .
(25)
The drop between J(c, h)s1 and J(c, h)s2 is considered to be

distinguishably large enough for us to say that (c, h)s1 has
converged on the real channel order. From the inequality in
(25), one can draw two lines with slope of ( + 1)/( 1) and
( 1)/( + 1) (see Figure 1). The shaped region in Figure 1
shows the data space given by (25). The criterion set in (25)
is, in fact, an enumeration search. However, the order estimation in the proposed GA does not solely rely on this enumeration search. In the proposed GA, we have employed certain
strategies to give the order chromosome a better chance of
converging to the real channel order. The simulation result
also shows that in most cases the order chromosomes can
converge on (or close to) the real channel order (see Table 2).
The enumeration search is, thus, used to compensate for the
drawback of the GA.
762
Start
Configure the proposed GA
according to Table 1
Initialize the chromosomes
Perform the GA operations including

selection, crossover, and mutation
Evaluate the chromosomes by the
objective function (13) and the
order fitness function (15)
Reinitialize the
parameter
chromosomes
Check if the condition

in (20) is satisfied?
No
Yes
Set Pm = 0
Minus the order

chromosomes by 1
and set Pm = 0

No
Inner loop
Yes
Store the converged result
No

Outer loop
Yes
Terminate
Figure 2: Flow diagram of the proposed GA.
The overall flow diagram of the proposed approach is illustrated in Figure 2. It can be seen that the proposed GA has
an inner and an outer loop. The criteria in (23) and (24) in
the inner loop guarantee that a global optimum is achieved.
We have shown that this solution may have an overestimated
order. The criterion in (25) in the outer loop is used to reexamine the solution reached and guarantee the correct estimate.
It is important to note that although the order part and
the parameter part have a distinct representation, fitness
function, and convergence criterion, we encode the two parts
into a single chromosome rather than keeping two separate
chromosomes. This is because the order part decides how
many genes of the parameter chromosome should be used to
calculate the objective value and, therefore, these two parts
cannot be decoupled.
4.
EXPERIMENTAL RESULT
Computer simulations are done to evaluate the performance

of the proposed GA. We use the same multichannel FIR system as that in [9], where two sensors are adopted and the
channel-impulse responses are

h1 = 0.21 0.50 0.72 0.36 0.21 ,

h2 = 0.227 0.41 0.688 0.46 0.227 .
(26)
Table 1 shows the configuration of the proposed GA. A large

population size is used in order to explore greater data space.
The searching space of channel order is from 1 to 8 (S = 3).
In the blind channel estimation, a model of FIR multichannel is normally modelled by oversampling the output of a
real channel. A multichannel model with two subchannels of
Table 2: Estimated order in the first inner loop run.

5
26
43.4%
6
21
35%
7
11
18.3%
8
2
3.3
Total
60
100%
763
Average order of
the population

7
6
5
4
3
0
200
300
400
500
600
700
800
600
700
800
Average J(c, h) of the

population
101
100
101
102
103
104
0
100
200
300
400
500
Generations
Figure 3: Evolution curves with correctly estimated order in the

first inner loop run.
101
100
Order = 6
101
J(c, h)
order 8 represents a real channel of order 16, which covers most normal channels. Note that order chromosomes of
length 3 can also map the searching space from 9 to 16. So,
in case no satisfactory solution is reached, one may remap
the order searching space (916) and rerun the algorithm.
A large mutation rate (pc = 0.5) is adopted to prevent premature convergence. To speed up the convergence of parameter chromosomes, we adjust P every 100 generations (see
Table 2), where a denotes the floor value of a.
A 25-dB Gaussian white noise is added to the output and
2,000 output samples are used to estimate the autocorrelation matrix Rxx . Figure 3 shows a typical evolution curve. In
each generation, the average objective value and estimated
order of the whole population are plotted. From Figure 3,
one can see that the order chromosomes converge much
faster than the parameter chromosomes. They converge on
the true channel order in the first inner loop run (order = 5
in Figure 3). We store this converged result, reduce the order
by 1, set pm = 0, and then begin another GA execution. After
the convergence (order = 4 in Figure 3), we evaluate these
two converged results (order = 5 and order = 4 in Figure 3)
by using the outer loop criterion in (25). Since there is an exponential drop between the two results, the condition in (25)
is satisfied. Thus, our algorithm stops and concludes that order 5 is the final estimate.
The channel order is estimated by detecting the drop between two converged objective values, which may be similar to the traditional method where the eigenvalues of an
overmodeled covariance matrix are calculated and the channel order is determined when there is a significant drop between two adjoining eigenvalues [4]. However, our algorithm
is more ecient since the calculation of eigenvalue decomposition can be avoided and it can be seen that the drop is much
more significant (an exponential drop).
Figure 4 shows an evolution curve where the channel order is overestimated in the first inner loop run (order = 6
in Figure 4). In Figure 4, the objective values of the first two
converged results are quite close, which does not satisfy the
criterion set in (25). Further examination is thus required.
As above, we can get the third converged result (order = 4 in
Figure 4). By evaluating it with (25), we can draw the same
conclusion as from Figure 3.
When compared with existing work, the convergence
speed of the proposed GA is satisfactory since it can be seen
that a quite reliable solution can be reached in about 1,000
generations, whereas the algorithm in [9] converges after
2,000 generations (note that in [9] the channel order is assumed to be known). In [8], an identification problem with
similar complexity is simulated. The algorithm converges after hundreds of generations, but it is nonblind and, there-
100
Order = 5
102
Order = 4
103
104
200
400
600
Generations
800
1000
1200
Figure 4: Evolution curve with overestimated order in first inner

loop run.
fore, the objective function is quite simple. It is important to

note that the convergence speed is aected by the complexity of the target problem. A more complicated multichannel
will result in slower convergence speed. We simulate a multichannel system with four subchannels and find that the algorithm converges after 1,000 generations. The eect of problem complexity seems to be a common problem of GAs and
needs further study.
Since the proposed GA needs to estimate the secondorder statistics of the channel output (the autocorrelation
matrix), it cannot be used directly in a rapidly varying channel. However, if some subspace tracking algorithm is employed (e.g., [13]), the noise subspace, that is, Un in (16) can
be updated when a new sample vector (x(n) in (7)) is received. The objective function can be adapted according to
764

100
formance of GA can be improved by making it execute more

generation cycles.
5.
101
RMSE
Based on the SIMO model and the subspace criterion, a new

GA has been proposed for blind channel estimation. Computer simulations show that its performance is comparable
with existing closed form approaches. Moreover, the proposed GA can provide a joint order and channel estimation,
whereas most of the existing approaches must assume that
the channel order is known or treat the problem of order estimation and parameter estimation separately.
102
103
10
CONCLUSIONS
15
20
25
30
SNR (dB)
SS-SVD
SS-GA
Figure 5: Performance comparison.
the channel variation. In this case, the proposed GA may

be applied to a rapidly varying channel. However, this requires further investigation and is beyond the scope of this
paper.
It is obvious that the computation is costly if the converged order in the first inner loop run is much greater than
the real channel order. In the proposed GA, though there
is no guarantee that the order chromosomes are absolutely
converging on the real channel order in the first inner loop
run, we have proposed several strategies to make them converge more closely. To illustrate the point, 60 independent
trials are done and we record the converged order in the first
inner loop run. Table 2 shows the results. The first row denotes the converged orders. The second row gives the times
where the order chromosomes converge on a certain order.
The third row shows the proportions. Table 2 illustrates that
at most times the order chromosomes converge to or close to
the real channel order (order 5 and 6 get about 80% of the
trials).
To evaluate the performance of the proposed GA, we
compare it with a singular value decomposition-based closed
form approach (SVD) that assumes that the channel order is
known [10]. Root mean square error (RMSE) is employed to
measure the estimation performance, which is defined as

Nt

1
1

RMSE =
hi h,
h
Nt i=1
(27)
where Nt denotes the number of Monte Carlo trials and is

t denotes the estimated channel parameters
set at 50, and h
in the ith trial. The comparison results are given in Figure 5.
It can be seen that the proposed GA achieves similar performance with lower signal-to-noise ratio (SNR). At high SNR,
the performance of GA is worse, because the converged result
is not close enough to the real optimum. However, the per-
ACKNOWLEDGMENTS
The authors would like to express their appreciation to the
Editor-in-Charge, Prof Riccardo Poli, of this manuscript for
his eort in improving the quality and readability of this paper. This work is done when Dr. Chen was visiting the City
University of Hong Kong and his work is supported by City
University Research Grant 7001416 and the Doctoral Program fund of China under Grant 20010561007.
REFERENCES
[1] L. Tong, G. Xu, and T. Kailath, Blind identification and
equalization based on second-order statistics: a time domain
approach, IEEE Transaction on Information Theory, vol. 40,
no. 2, pp. 240349, 1994.
[2] L. Tong, G. Xu, Hassibi, B., and T. Kailath, Blind channel
identification based on second-order statistics: a frequencydomain approach, IEEE Transactions on Information Theory,
vol. 41, no. 1, pp. 329334, 1995.
[3] L. Tong and S. Perreau, Multichannel blind identification:
from subspace to maximum likelihood methods, Proceedings
of the IEEE, vol. 86, no. 10, pp. 19511968, 1998.
[4] A. P. Liavas, P. A. Regalia, and J.-P. Delmas, Blind channel
approximation: eective channel order determination, IEEE
Trans. Signal Processing, vol. 47, no. 12, pp. 33363344, 1999.
[5] L. Tong and Q. Zhao, Joint order detection and blind channel
estimation by least squares smoothing, IEEE Trans. Signal
Processing, vol. 47, no. 9, pp. 23452355, 1999.
[6] J. Ayadi and D. T. M. Slock, Blind channel estimation and
joint order detection by MMSE ZF equalization, in Proc.
IEEE 50th Vehicular Technology Conference (VTC 99), vol. 1,
pp. 461465, Amsterdam, The Netherlands, September 1999.
[7] L. Yong, H. Chongzhao, and D. Yingnong, Nonlinear system
identification with genetic algorithms, in Proc. 3rd Chinese
World Congress on Intelligent Control and Intelligent Automation (WCICA 00), vol. 1, pp. 597601, Hefei, China, JuneJuly
2000.
[8] L. Yao and W. A. Sethares, Nonlinear parameter estimation
via the genetic algorithm, IEEE Trans. Signal Processing, vol.
42, no. 4, pp. 927935, 1994.
[9] S. Chen, Y. Wu, and S. McLaughlin, Genetic algorithm optimization for blind channel identification with higher order
cumulant fitting, IEEE Transaction on Evolutionary Computation, vol. 1, no. 4, pp. 259265, 1997.
[10] E. Moulines, P. Duhamel, Cardoso, J.-F., and Mayrargue, S.,
Subspace methods for blind identification of multichannel
FIR filters, IEEE Trans. Signal Processing, vol. 43, no. 2, pp.
516525, 1995.

[11] K. Krishnakumar, Microgenetic algorithms for stationary
and nonstationary function optimization, in Proc. Intelligent
Control and Adaptive Systems, vol. 1196 of SPIE Proceedings,
pp. 289296, Philadelphia, Pa, USA, November 1990.
[12] K. F. Man, K. S. Tang, and S. Kwong, Genetic Algorithms: Concepts and Design, Springer-Verlag, London, UK, 1999.
[13] S. Attallah and K. Abed-Meraim, Fast algorithms for subspace tracking, IEEE Signal Processing Letters, vol. 8, no. 7,
pp. 203206, 2001.
Chen Fangjiong was born in 1975, in

Guangdong province, China. He received
the B.S. degree from Zhejiang University
in 1997 and the Ph.D. degree from South
China University of Technology in 2002, all
in electronic and communication engineering. He worked as a Research Assistant in
City University of Hong Kong from January 2001 to September 2001 and from January 2002 to May 2002. He is currently with
the School of Electronic and Communication Engineering, South
China University of Technology. His research interests include
blind signal processing and wireless communication.
Sam Kwong received his B.S. and M.S. degrees in electrical engineering from the State
University of New York at Bualo, USA, and
University of Waterloo, Canada, in 1983 and
1985, respectively. In 1996, he received his
Ph.D. degree from the University of Hagen, Germany. From 1985 to 1987, he was a
Diagnostic Engineer with the Control Data
Canada where he designed the diagnostic
software to detect the manufacture faults of
the VLSI chips in the Cyber 430 machine. He later joined the Bell
Northern Research Canada as a Member of Scientific Sta, where
he worked on both the DMS-100 Voice Network and the DPN100 Data Network Project. In 1990, he joined the City University
of Hong Kong as a Lecturer in the Department of Electronic Engineering. He is currently an Associate Professor in the Department
of Computer Science at the same university. His research interests
are in genetic algorithms, speech processing and recognition, data
compression, and networking.
Wei Gang was born in January 1963. He received the B.S., M.S., and Ph.D. degrees in
1984, 1987, and 1990, respectively, from Tsinghua University and South China University of Technology. He was a Visiting Scholar
to the University of Southern California
from June 1997 to June 1998. He is currently
a Professor at the School of Electronic and
Communication Engineering, South China
University of Technology. He is a Committee Member of the National Natural Science Foundation of China.
His research interests are signal processing and personal communications.
765

Application of Evolution Strategies to the Design

of Tracking Filters with a Large Number
of Specifications
Garca Herrero
Jesus
Departamento de Informatica, Escuela Politecnica Superior (EPS), Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain
Email: jgherrer@inf.uc3m.es
Juan A. Besada Portas

Departamento de Senales, Sistemas y Radiocomunicaciones, ETSI Telecomunicacion,
Universidad Politecnica de Madrid, 28040 Madrid, Spain
Email: besada@grpss.ssr.upm.es
Antonio Berlanga de Jesus

Departamento de Informatica, EPS, Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain
Email: aberlan@ia.uc3m.es
Jose M. Molina Lopez

Departamento de Informatica, EPS, Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain
Email: molina@ia.uc3m.es
Gonzalo de Miguel Vela

Email: gonzalo@grpss.ssr.upm.es
Jose R. Casar Corredera

Email: jramon@grpss.ssr.upm.es
Received 28 June 2002 and in revised form 14 February 2003
This paper describes the application of evolution strategies to the design of interacting multiple model (IMM) tracking filters in
order to fulfill a large table of performance specifications. These specifications define the desired filter performance in a thorough
set of selected test scenarios, for dierent figures of merit and input conditions, imposing hundreds of performance goals. The
design problem is stated as a numeric search in the filter parameters space to attain all specifications or at least minimize, in
a compromise, the excess over some specifications as much as possible, applying global optimization techniques coming from
evolutionary computation field. Besides, a new methodology is proposed to integrate specifications in a fitness function able to
eectively guide the search to suitable solutions. The method has been applied to the design of an IMM tracker for a real-world
civil air trac control application: the accomplishment of specifications defined for the future European ARTAS system.
Keywords and phrases: evolution strategies, radar tracking filters, multicriteria optimization.
1.
INTRODUCTION
A tracking filter has the double goal of reducing measurement noise and consistently predicting future values of signal. This kind of problems has ecient solutions in the case
of stationary signals, but solutions for nonstationary problems are not so consolidated yet. This is the case in the field
we are dealing with in this paper, tracking aircraft trajectories
from radar measurements in air trac control (ATC) applications.
Evolution Strategies to Design Tracking Filters

The design of tracking filters for the ATC problem demands complex algorithms, like the modern interacting
multiple model (IMM) filter [1]. These algorithms depend
on a high number of parameters (seven in the IMM design presented here) which must be adjusted in order to
achieve, as much as possible, the desired tracking filter performance. IMM has proven certainly satisfactory performance for tracking maneuvering targets, in relation to previous approaches. However, the relation between its input
parameters and final performance is far from clear due to
strongly nonlinear interactions among all parameters. Therefore, no direct design methodology has been proposed to
generate the best solution for a specific application to date,
apart from manual parameterization and evaluation with
simulation.
Besides, real-world applications of tracking filters for
ATC usually address performance specifications defined over
an exhaustive set of realistic operational scenarios and covering a number of conflicting figures of merit. These two
characteristics, large table of specifications and application
of complex algorithms, make the design of modern tracking
filter a very complex problem.
In this paper, the authors expose a new methodology to
design and adjust tracking filters for ATC applications based
on the use of evolution strategies (ES) as an optimization
problem over a customized cost function (fitness function).
The method has been demonstrated by the design of a realworld engineering application: a modern ATC system promoted by EUROCONTROL for Europe, the ARTAS system.
Due to the high dimensionality of parameters space and the
large number of defined constrains (the operational scenarios and performance figures sum up to 264 specifications for
ARTAS), an automatic procedure to search and tune the final solution is mandatory. Classical techniques, such as those
based on gradient descent, were discarded due to the high
number of local minima presented by the fitness function.
ES have been selected for this problem due to their high robustness and immunity to local extremes/discontinuities in
the fitness function.
However, the selection of a fitness function taking account of all specifications is not so direct since all of them
should be simultaneously considered to guide the search.
The performance of ES has been analyzed in previous works
for sets of test functions, but its application to a real engineering problem with hundreds of specifications, where
the fitness landscapes properties are not well known, is
a harder task. A procedure has been proposed to build
this function, exploiting specific knowledge about the domain. Objectives with similar behavior in the search are
grouped first to select the worst cases for each group, and
then combine all of them in the final cost function. Results show that this procedure is able to find acceptable solutions lowering the excess over some specifications as much as
possible.
The paper starts by presenting the design performance
constrains for ATC problems in Section 2 (particularized
for an industrial application, the ARTAS system) and a description of the IMM algorithm in Section 3. In Section 4,
767
we explain the proposed optimization method based on
ES. Finally, Sections 5 and 6 are aimed at discussing optimization results and characteristics of solutions minimizing the fitness function, and summarizing the main conclusions.
2.
SPECIFICATIONS FOR TRACKER MODULE

OF ARTAS SYSTEM
ARTAS [2] is the concept of a Europe-wide distributed

surveillance system developed by EUROCONTROL, relying
on the implementation of interoperable units coordinated
together. Each ARTAS unit will be in charge of processing all
surveillance data reports (i.e., primary and secondary radar
reports, ADS reports, etc.) to form a good estimate of the
current air trac situation in its responsibility volume.
Each of the ARTAS units should fulfill a set of well defined interoperability requirements to ensure a very high
quality of the assessed air situation that will be delivered to
the rest of the units. ARTAS defines, with a highly detailed
level, the required performance for all components, and especially for the tracker systems which process radar data.
To do this, it considers that the worst case of track performance will be expected in the case that a tracker receives
only monoradar data, while other cases of fusion with extra
data situations lead to relatively better performance. Therefore, the main emphasis is given to this monoradar case,
leaving the definition of performance for other cases as a
matter of specifying improvement factors. The most important aspect considered for tracker quality definition is the
specification of track output quality in a set of well-defined
representative input conditions. These conditions are classified with respect to radar and aircraft characteristics because of the very dierent behavior of any tracker for varying input conditions. Radar parameters represent the accuracy and quality of available data, while target conditions
are the distance and orientation of the flight with respect
to radar, motion state of aircraft (uniform velocity, turning, accelerating), and specific values of speed and acceleration.
Since it would not be possible to specify the performance
for all possible input situations, which would require an
enormous amount of figures, an area is defined in which the
performance is described by a limited amount of parameters
and some simple relations. Besides, since ARTAS will provide radar data processing basically for the control of civil
aircraft, the specifications consider the most representative
situations and the upper and lower limits of speed and accelerations in these conditions. ARTAS dierentiates scenarios
for two basic types of controlled areas in ATC terminal maneuvering area (TMA), covered by sensors with shorter refresh period (4 seconds), moderate range (up to 80 nautical
miles or NM), and enroute area, and by sensors with longer
period (12 seconds) and larger coverage (up to 230 NM). We
have considered in this study the enroute area since the difficulty is higher to achieve the performance figures specified
in this situation, being the design process for other situations
completely similar.
768
Out of all possible combinations, ARTAS has carried
out a choice containing the most important and realistically
worst cases. It comprises a number of simple input scenarios on which the nominal track quality requirements are defined. The methodology specified for this evaluation is based
on Monte Carlo simulation with the input parameters (radar
and trajectory parameters) particularized for each scenario.
The trajectories in dierent scenarios vary in the following
features:
(i) orientation with respect to the radar (radial or tangential starting courses, starting at a short, medium, or
maximum range);
(ii) sequence of dierent modes of flight (uniform, turns,
and longitudinal accelerations);
(iii) values of accelerations (upper and lower limits);
(iv) values of speeds (upper and lower limits).
There are eight specified simple scenarios with uniform
motion, and twelve complex scenarios including initialization with uniform motion, transition to transversal maneuver, and a second transition to come back to uniform motion.
When the target is far enough from the radar, a pure radial
approach to the radar leads to the worst case for transversal and heading errors during maneuver transitions, since azimuth error (much higher than radial error) is projected over
these components. With a similar reasoning, a pure tangential approach is the worst case for longitudinal and groundspeed errors during maneuvers. So, the scenarios basically
contain these two types of situations, varying in distance, velocities, and acceleration magnitudes. The authors have considered a couple of scenarios with longitudinal maneuvers although ARTAS does not specify performance for that type of
situations. The reason for this is that these operations appear in civil operations (especially in the TMAs) and the
filter is conceived to operate in real conditions. Otherwise,
the resulting tracking filter could be overfitted to transversal maneuvers, but developing undesirable systematic errors
with longitudinal maneuvers. The specifications for longitudinal scenarios were obtained extrapolating the ARTAS relations for the new input conditions. The resulting 22 scenarios, to be taken into account in the design of tracking filter are shown in Figure 1 (a circle represents radar position
and a square the initial position of target trajectory). Since
the specifications depend tightly on the input conditions,
there is no a priori worst case scenario whose attainment
would guarantee all cases, but all of them have to be considered simultaneously in the design process. It must be taken
into account that the design of tracker will be done considering that all requirements will be met without intermediate adaptation of the tracker parameters once the tracker
has been tuned for the typical radar characteristics and controlled volume (in this case, enroute area). The design will
provide a single set of parameters that would allow the filter to accomplish all the specifications in all the scenarios
considered.
For each of these scenarios, the performance of the
tracker should approach listed performance goal values un-

der the defined conditions. The accuracy requirements are
expressed as a function of several input parameters depending on each specific-tested scenario: groundspeed, range, orientation of the trajectory with respect to the radar (radial
and tangential projection of velocity heading), magnitude of
the transversal acceleration, and magnitude of the groundspeed change. There are four quality parameters in which
the requirements are defined: two for position (errors measured along and across trajectory direction, resp., longitudinal and transversal errors) and velocity (errors expressed
in the groundspeed and heading components). All of them
are expressed with the root mean square errors (RMSE), estimated by means of Monte Carlo simulation. Similarly, accuracy requirements are also defined for vertical coordinates,
but this work will address only the 2D (horizontal) filtering,
although similar ideas could be used for the design of a vertical tracker.
There are three basic parameters characterizing the desired shape of the RMS functions: peak value (RMSpv), convergence value (RMScv), and time period of RMS convergence to a certain level close to the final convergence value
(RMSpv + c RMSpv). These values are specified for dierent situations: initialization, transition from uniform motion
to turn, and transition to come back from turn to uniform
motion. Therefore, for each type of situation, the specifications are particularized according to the target evolution,
defining a bounding mask for each magnitude and scenario.
An example is indicated in Figure 2, with the transversal error obtained through simulation and the ARTAS bounding
mask for the scenario 10. Instead of measuring performance
along the whole trajectory in each scenario, only some interest points in the aircraft trajectory will be assessed to guarantee that the measured performance attains the bounding
mask: convergence RMSE in rectilinear motion before and
after maneuver segments (CV1 and CV2), and maximum
RMSE during maneuver (PV).
The design of a tracking filter aims at attaining a satisfactory trade-o among all specifications. The quality of
the design will be evaluated by means of simulation over
22 test scenarios, producing several types of trade-os to
be considered. First, the dierent transitions in modes of
flight (uniform and maneuvers) impose a trade-o between
steady-state smoothing and peak error during maneuvers,
which always lead to conflicting requirements (the higher the
smoothing factor the higher the filter error during transitions and vice versa). This is considered with the three representative values for each scenario and magnitude: CV1,
CV2, and PV. Secondly, each one of the magnitudes evaluated (transversal, longitudinal, heading and groundspeed
RMS errors) could individually shift the design towards different solutions, and so all magnitudes must be considered
at the same time to arrive to a certain compromise. Finally, dierent design scenarios impose harder conditions
for dierent magnitudes (radial trajectories for transversal
and heading errors, etc.) so that all scenarios should be
taken into account. In Table 1, we indicate the arrangement
of specifications as they will be considered in the design.
Specifications s() are particularized for the three evaluation
150 m/s
769
35 NM
150 m/s
15 NM
300 m/s
150 m/s
80 NM
65 NM
215 NM
50 NM
50 NM
15 NM
150 m/s
300 m/s
35 NM
150 m/s
230 NM
200 NM
6
a = 2.5 m/s2
a = 2.5 m/s2
150 m/s
300 m/s
65 NM
80 NM
200 NM
10
a = 2.5 m/s2
a = 2.5 m/s2
150 m/s
300 m/s
150 m/s
300 m/s
150 m/s
215 NM
215 NM
65 NM
80 NM
215 NM
11
12
300 m/s
300 m/s
150 m/s
50 NM
a = 2.5 m/s2
30 NM
a = 6 m/s2
15
15 NM
30 NM
200 NM
17
m/s2
a = 6 m/s2
14
150 m/s
15 NM
230 NM
16
a = 6 m/s2
13
a = 6 m/s2
a = 1.2
300 m/s
300 m/s
30 NM
50 NM
50 NM
a = 2.5 m/s2
18
19
a = 6 m/s2
20
a = 2.5 m/s2
300 m/s
300 m/s
230 NM
200 NM
21
22
a = 1.2 m/s2
Figure 1: Design scenarios for tracking filter.
Table 1: Arrangement of design specifications.

Scenario PVlongitudinal
CV1longitudinal
CV2longitudinal
PVheading
CV1heading
CV1heading
s(PV41 )
..
.
s(CV141 )
..
.
S(CV241 )
..
.
s(PV4 j )
..
.
s(CV14 j )
..
.
s(CV24 j )
..
.
1
..
.
s(PV11 )
..
.
s(CV111 )
..
.
s(CV211 )
..
.
j
..
.
s(PV1 j )
..
.
s(CV11 j )
..
.
s(CV21 j )
..
.
points (PVi j ,CV1i j ,CV2i j ), for each assessed magnitude (i =

{longitudinal, transversal, groundspeed, heading}), and for
..
.
..
.
each tested scenario ( j = 1, . . . , 22). Therefore, the total

number of specifications is 3 4 22 = 264.
770

Transversal error (m)
800
PV
700
Plots
z[k]
600
Prediction
z1
Update
Peak value RMS

specification
500
1]
x[k
P[k 1]
Kalman
filter
x[k]
P[k]
400
300
100
0
Figure 3: Kalman filter to process measurements.
Convergence value
RMS specification
CV2
200
CV1
0
100
200
300
Time (s)
400
500
600
Figure 2: Specifications on tracker performance for each assessed

magnitude (meters).
3.
IMM TRACKING FILTER FOR AIR TRAFFIC CONTROL
Since the specifications for ARTAS units require a very high

quality of output, the tracker in the core will have to apply advanced filtering techniques (IMM filtering, joint probabilistic data association, etc.). In this section we briefly describe
the basic principles of IMM trackers, the proposed structure
for these application, and the basic aspects for the design
process.
3.1. General considerations
The IMM tracking methodology maintains a set of dierent
dynamic models, each one is matched to a specific type of
motion pattern, and represents the target trajectory as a series of states, with the sequence of transitions modelled as a
Markov chain. In our case, the states considered will be uniform motion, transversal maneuvers (both towards right and
left), and longitudinal maneuvers. To estimate the target state
(location, velocity, etc.), there is a bank of Kalman filters corresponding to the dierent motion models in the set, complemented with an estimation of the probabilities that the
target is in each one of the possible states.
So, the elementary module in the tracking structure is a
Kalman filter [3] which sequentially processes the measurements z[k], combining them with predictions computed according to the target dynamic model, to update the estima
tion of target state and associated covariance matrix x[k],
P[k], respectively (see Figure 3).
The IMM maintains tracks conditioned to each jth motion state, with dierent Kalman filters, x j [k], P j [k], and estimation of the probability that the target is in each of them,
j[k]. One of the basic elements in this methodology is the
interacting process, which keeps all of them engaged to the
most probable one. The structure considered in this work is
shown in Figure 4, with four Kalman filters corresponding
to the four motion states considered. It takes as input the
target horizontal position measured in time instant k, z[k],

and provides the estimation of target position and kinematic
state, together with estimated covariance matrix of errors,
x[k],
P[k].
The IMM algorithm develops the following four steps to
process the measures received from the available sensors to
estimate the target state: intermode interaction/mixing, prediction, updating, and combination for output.
(i) The tracking cycle for each received plot z[k] starts
with the interaction phase, mixing the state estimators
coming from each of the four models to obtain the new
inputs x o j [k] and Po j [k]. So, the input to each Kalman
filter is not directly the last update but a weighted combination of all modes taking into account the mode
probabilities. This step is oriented to assure that the
most probable mode dominates the rest.
(ii) Then, the prediction and updating phases are performed with the Kalman filter equations according to
the available models for target motion contained in
each mode.
(iii) The estimated probabilities of modes j [k] are updated, based on two types of variables: a priori transition probabilities of Markov chain pi j , and mode likelihoods computed with the residuals between each plot
and mode predictions j [k].
(iv) Finally, mode probabilities are employed as weights to
combine partial tracks for final output. Besides, each
individual output and probability is internally stored
to process plots coming in the future.
3.2.
Design of an IMM filter
The two basic aspects involved in the design of an IMM

tracking system which determine its performance are the
following: the number and type of models used in the set,
and transition parameters. The first aspect is dependent on
each tracking problem, and we have selected, as seen in
Section 3.1, a particular structure composed of four tracking modes reflecting the most representative situations in
civil air trac: constant velocity, turns to right or left, and
longitudinal accelerations. They correspond to target states
= 1, 2, 3, 4 in Figure 4. All modes interact within the IMM
structure to achieve the most proper response for each situation. Mode 1, = 1, is a simple constant velocity model
x1 [k 1]
P1 [k 1]
771
x2 [k 1]
P2 [k 1]
z1
x4 [k 1]
P4 [k 1]
Interaction/combination
x01 [k 1]
P01 [k 1]
Plots
z[k]
x3 [k 1]
P3 [k 1]
x02 [k 1]
P02 [k 1]
Kalman
filter
=2
Kalman
filter
=1
x03 [k 1]
P03 [k 1]
Kalman
filter
=3
1 [k]
2 [k]
3 [k]
4 [k]
x1 [k]
P1 [k]
x2 [k]
P2 [k]
x3 [k]
P3 [k]
x4 [k]
P4 [k]
x04 [k 1]
P04 [k 1]
1 [k 1] 4 [k 1]
z1
Kalman
filter
=4
Mode
probability
computation
1 [k]
Mode
combination
for output
4 [k]
x[k]
P[k]
Figure 4: IMM structure.

Table 2: Parameters to adjust in the IMM design.
Parameter
Description
pUT
pUL
Transition probability between uniform motion and transversal acceleration

Transition probability between uniform motion and longitudinal acceleration
pTU
pLU
Transition probability between transversal acceleration and uniform motion

Transition probability between longitudinal acceleration and uniform motion
at
t 2
l 2
Typical transversal acceleration for parametric circular models ( = 2, 3)

Plant noise variance for parametric circular models ( = 2, 3)
Plant noise variance for longitudinal models ( = 4)
with zero plant variance noise. Modes for tracking transversal maneuvers (turns), = 2, 3, are filters with circular extrapolation dynamics [4, 5], one for each possible direction.
They provide a highly adaptive response to transversal transitions, being one of the parameters to fix, in this filter, the
typical acceleration of target when performing turns. Finally,
mode = 4 is a linear-extrapolation motion model with a
plant noise component projected along longitudinal direction. Since the target deviations along transversal direction
are covered by circular modes, this last model will quickly
detect and adapt to variations in longitudinal velocity during
accelerations and decelerations.
Each mode in the structure has its own parameters to
tune, and must be adjusted in the design process. Besides, the
transition probabilities between all possible pairs of modes,
modelled as a Markov chain, are directly related with the rate
of change from any mode to the rest. They have a very deep
impact in the tracker behaviour during transitions and the
purity of output during each type of motion, so the design

must also decide the most proper values for these parameters.
Since there are four modes, the transition probability matrix
pi j , being defined each term as probability of the target arriving to state j at time k, given that the state at time k 1 was
i, is
p
11
p
21
T[k] =
p
31
p41
p12
p22
p32
p42
p13
p23
p33
p43
p14
p24
p34
p44
1 pUT pUL 0.5 pUT 0.5 pUT

pUL
p
1
p
0
0
TU
TU
.
=
pTU
0
1 pTU
0
pLU
0
0
1 pLU
(1)
772
The number of parameters have been simplified by considering only as possible transitions between uniform motion and
the rest of modes. The parameters pUT , pUL are the probabilities of starting transversal and longitudinal maneuvers, given
an aircraft at uniform motion, while the parameters pTU , pLU
are the probabilities of transitions to uniform motion, given
that the aircraft is performing, respectively, transversal and
longitudinal maneuvers.
It is important to notice that all parameters, those in
each particular model plus transition probabilities in Markov
chain, are completely coupled through the IMM algorithm
since partial outputs from each mode are combined and
feedback all modes. So, there is a strongly nonlinear interaction between them, making the adjusting process certainly
dicult. The whole set of parameters in the tracking structure is summarized in Table 2.
4.
DESIGN OF FILTER PARAMETERS
The design of the particular IMM tracking structure addressed in this work, stated as adjusting the seven numeric
input parameters to fit filter performance within ARTAS
specifications, can be generally considered as a numerical optimization problem. We are searching for the proper combination of real input parameters that minimizes a real function assessing the quality of solutions as a cost f : V
R7 R. The final design solution
xd V should be a
global minimum of f , which means that f (

xd ) f (
x ) for
7
any x V R . The subspace V stands for the region
of feasible solutions, defined as those vectors representing
a valid IMM filter: parameters for probabilities must fall in
the interval [0, 1] and parameters for variances must be positive. These are the only constraints to be accomplished by
solutions during the search. Performance specifications are
not considered as constraints here, but they will be used as
penalty terms in the objective cost function. The cost would
achieve a minimum value of zero only in the ideal case of a
solution accomplishing all specifications, grading the rest of
possible cases with a positive global cost function that will be
detailed later.
4.1. Evolution strategies
In numeric optimization problems, when f is a smooth,
low-dimensional function, there are an available number
of classic optimization methods. The best case is for lowdimensional analytical functions, where solutions can be analytically determined or found with simple sampling methods. If partial derivatives of function with respect to input
parameters are available, gradient-descent methods could be
used to find the directions leading to a minimum. However,
these gradient-descent methods quickly converge and stop at
local minima, so additional steps must be added to find the
global minimum. For instance, with a moderated number of
global minima, we could run several gradient-descent solvers
to find the best solution. The problem is that the number
of similar local minima increases exponentially with dimensionality, making these types of solvers unfeasible. In our particular case, besides a high-dimensional input space causing
multimodal dependence, we do not have an analytical function to optimize. It is the result of a complex and exhaustive evaluation process implying the simulation and performance assessment of tracking structure on the whole set of
22 scenarios defined. The evaluation of a single point in the
input space requires several minutes of CPU time (Pentium
III, 700 MHz). Besides, the evaluation of quality after all simulations is not direct but it should take into account system
performance in all scenarios and magnitudes in comparison
with the whole table of specifications. As we will see later,
multiple specifications (or objectives) will increase the number of solutions with similar performance, increasing therefore the complexity of the search.
For complex domains, evolutionary algorithms have
proven to be robust and ecient stochastic optimization
methods, combining properties of volume and path-oriented
searching techniques. ES [6] are the evolutionary algorithms
specifically conceived for numerical optimization, and have
been successfully applied to engineering optimization problems with real-valued vector representations [7]. They combine a search process which randomly scans the feasible region (exploration) and local optimization along certain paths
(exploitation), achieving very acceptable rates of robustness
and eciency. Each solution to the problem is defined as an
individual in a population, codifying each individual with a
couple of real-valued vectors: the searched parameters and a
standard deviation of each parameter used in the search process. In this specific problem, one individual will represent
the set of dynamic parameters in the IMM structure, as indicated in Table 2, (x1 , . . . , x7 ), and their corresponding standard deviations (1 , . . . , 7 ).
The optimization search basically consists in evolving a
population of individuals in order to find better solutions.
The computational procedure of ES can be summarized in
the following steps, according to the named + strategy
defined by Back and Schwefel [8], and particularized for our
problem:
(1) generate an initial population with individuals uniformly distributed on the search space V ;
(2) evaluate the objective value for each individual in pop
xi ), i = 1, . . . , ;
ulation f (
(3) Select the best parents in population to generate a set
of new individuals, by means of genetic operators
of recombination and mutation. In this case, recombination follows a canonical discrete recombination [6],
and mutation is carried out as follows:

i = i exp N(0, ) ,
xi = xi + N 0, i ,
(2)
where xi and i are the mutated values and N(0, )

stands for a normal distribution with zero mean and
variance 2 ;
(4) calculate the objective value of the generated ospring
xi ), i = 1, . . . , , and select the best individuals of

f (
this new set containing parents and children to form
the next generation;

(5) Stop if the halting criterion is satisfied. Otherwise, go
to step (3).
We have implemented ES for this problem with a size of
50 + 30 individuals and mutation factor = 0.9. The
fitness function will directly depend on the dierences between RMS values of errors, evaluated through Monte Carlo
simulation, and ARTAS specifications for all scenarios and
magnitudes, as will be detailed next. It is important to notice that simulations are carried out using common random
numbers to evaluate all individuals in all generations, enhancing system comparison within the optimization loop. In
other words, the noise samples used to simulate all scenarios in the RMS evaluation are the same for each individual
in order to exploit the advantages coming from the use of
a deterministic fitness function. Besides, the number of iterations was selected to guarantee that confidence intervals
of estimated figures were short in relation to the estimated
values.
A basic aspect to achieve successful optimization in any
evolutionary algorithm is the control of diversity, but this
appropriateness will depend on the problem landscape. If a
population converges to a particular point in a search space
too fast in relation to the roughness of its landscape, it is
very probable that it will end in a local minimum. On the
contrary, a too slow convergence will require a large computational eort to find the solution. ES give the higher importance to the mutation operator, achieving the interesting
property of being self-adaptive in the sizes of steps carried
out during mutation, as indicated in step (3) of the algorithm above. Before selecting an algorithm for optimization,
it is interesting to consider the point of view of the no free
lunch (NFL) theorem [9], which asserts that no optimization procedure is better than a random search if the performance measurement consists in averaging arbitrary fitness
functions. The performance of ES has been widely analyzed
under a set of well-known test functions [8, 10]. They are
artificial analytical functions used as benchmarks for comparison of representative properties of optimization techniques, such as convergence velocity under unimodal landscapes, robustness with multimodality, nonlinearity, constraints, presence of flat plateaus at dierent heights, and so
forth. However, the performance on these test functions cannot be directly extrapolated to real engineering applications.
The application of ES to a new problem, such as our complex IMM design against multiple specifications where the
landscape properties are not known (it is not known even
if there is a global minimum or not), is a challenge open to
research.
4.2. Multiobjective optimization
The selection of the proper fitness function for this application is the problem-dependent feature with the highest impact on the algorithm (higher than the ES parameters such
as population size or mutation factor). Really, we should regard this design as a multiobjective optimization problem,
where each individual objective is the minimization of dierence between desired specification and assessed performance
773
in each specific figure of merit. When a problem involves simultaneous optimization of multiple, usually conflicting objectives (or criteria), the goal is not so clear as in the case of
single-objective optimization. The presence of dierent objectives generates a set of alternative solutions, defined as
Pareto-optimal solutions [11]. The presence of conflicting
multiple objectives leads to the fact that dierent solutions
cannot be directly compared and ranked to determine the
best one, but the concept of domination appears for comx1 is dominated by a second one
x2 if
parisons. A solution
x2 is better than x1 simultaneously in all objective functions

considered. In any other case, they could not be strictly compared. Taking into account this concept of domination, a
Pareto-optimal set P is defined as the set of solutions such
that there exists no solution in the search space dominating
any member in P.
Some multiobjective optimization techniques have the
double goal of guiding the search towards the global Paretooptimal set and at the same time covering as many solutions
as possible. There are several proposed evolutionary methods
[12] that address this goal by maintaining a population diversity to cover the whole Pareto front. This fact implies first
the enlargement of population size and then specific procedures to guarantee guiding the search to the desired optimal
set with a well-distributed sample of the front. Among these
procedures, we can mention methods, such as selection by
aggregation and so forth, switching the objectives during the
selection phase to decide which individuals will appear in the
mating pool. Zitzler et al. [12] analyze and compare, over
some standard test analytical functions, some of the most
outstanding multiobjective evolutionary algorithms.
From the authors point of view, the peculiarities of the
problem dealt with, namely, the complexity and computational cost of evaluation function together with the considerable number of specifications, preclude the application of
techniques to derive the whole Pareto set. We have considered a weighting sum on partial goals to build a global fitness
function:
Minimizex

wi f i
x .
(3)
i=1
As indicated by Deb [11], this type of approaches with

weighted sums converge to particular solutions of Pareto
front, corresponding to the tangential point in the direction
defined by the vector of weights. The general idea is illustrated in Figure 5 for a simplified case with only two objective
functions f1 and f2 . The shaded area is an example of finite
image set of the feasible region by objective functions f1 and
f2 , being the set of nondominated solutions (Pareto front, P)
represented with a bold line. No solution in the image set has
simultaneously lower values in f1 and f2 than any point in
P. A pair of weights define a direction for search in space of
objective functions, leading to the tangential point for each
solution.
However, a large number of specifications will make the
weighted summation cumbersome, being dicult that all
objectives are simultaneously considered to guide the search.
774
Pareto-optimal front
Minimum of
w1 f1 + w2 f2
f1
Minimum of
w1 f1 + w2 f2
Excess over specifications
f2
50
100
150
200
250
Figure 5: Solutions with a weighted sum method.
x,
x > 0,
0,
x 0.
R(x) =
(4)
(ii) Dierent physical magnitudes (errors in position,

heading, and groundspeed) have the same importance,
40
60
80
20
40
60
80
Generations
100
120
140
100
120
140
20
15
Fitness
In our specific problem, we should fix a weighting vector

with 264 components. A variation is proposed to reduce the
number of objectives in the sum by exploiting knowledge
about the problem. Basically, objectives with similar behavior are grouped to select a representative per group, the
one with the worst value, so that it guarantees that all objectives in the group are represented in the final function. If
we consider Table 1 with the whole set of specifications, we
are going to select the worst case for each column, leaving
only 12 terms in the summation. It is important to notice that
this maximum operation will break the linearity of function
with respect to objectives and will make the landscape depend on each specific input vector. A trajectory of solutions
in the search process may jump along dierent goal functions
if the scenarios with the worst case change. The justification
comes from the fact that each magnitude has certain dependence with the input parameters similar in all scenarios, so a
single representative is enough to be considered in the optimization. Besides, the selection of the worst case assures that
if the method can satisfy that term, all the scenarios will be
simultaneously accomplished.
Taking into account this consideration, the fitness function, which assesses the quality of a solution as the degree
of attainment performance figures with respect to specifications, is presented next. The following details have also been
considered.
(i) It assesses the excess over the specification for each
performance figure, penalizing a solution as the error increases, but once the error is below the specification, the cost is zero. This is so because there is
no additional advantage if the RMSE decreases more
after the required values are attained. This is implemented for each magnitude by means of the expression R(pi s(pi )), where pi is the ith performance figure (RMSE), s(pi ) the specification, and R() the ramp
function:
20
10
5
0
Figure 6: Evolution of fitness and performance in each specific objective.
and so are normalized with the specification value,

defining a partial cost for ith figure,
%
ci = R
&
pi s(pi )
.
pi
(5)
(iii) In order to add some flexibility in the trade-o between maneuver and uniform motion performances,
weighting factors t are included. They allow us to vary
the priority of these performance figures, in the case
where all of them cannot be attained at the same time,
defining therefore a cost per jth scenario,

c sj =
4

i=1
'
PV R
PVi j s PVi j

PVi j
(
)
CV1i j s CV1i j

+ CV1 R
s CV1i j
(
CV2i j s CV2i j

+ CV2 R
s CV2i j
)
(6)
)*
where the subindex i represents each interest magnitude (longitudinal, transversal, groundspeed, and
heading) and j the scenario index.
(iv) Finally, considering the set E of all the scenarios where
the performance figures are evaluated (in our example,
the 22 scenarios indicated in Figure 1), the worst case
scenario is j, for each figure of merit and selected time
775
Longitudinal error (m)
700
1100
1000
600
900
500
800
700
400
600
300
500
400
200
300
100
0
200
0
50
100
150
200 250
Time
300
350
400
450
100
50
100
Groundspeed error (m)
150
200 250
Time
300
350
400
450
Heading error (m)

20
16
18
14
16
12
14
10
12
10
6
4
2
0
50
100
150
200 250
Time
300
350
400
50
100
150
200 250
Time
300
350
400
450
Figure 7: Performance and ARTAS specifications for scenario 12.
instant (PV, CV1, and CV2). Therefore, the final goal

function to be minimized is as follows:
4

'
(
PV max R
j E
i=1
(
+ CV1 max R
j E
(
+ CV2 max R
j E
PVi j s PVi j

PVi j
)
CV1i j s CV1i j

s CV1i j
CV2i j s CV2i j

s CV2i j
)
(7)
)*
So, this function considers the relative excesses over

specifications for all performance figures, each one assessed in the worst case scenario.
5.
RESULTS
In this section, the results obtained along the optimization

process to adjust the filter parameters according to ARTAS
specifications are presented and analyzed. They have been

obtained particularizing expression (6) to the case of a weight
of 1 for all magnitudes PV = CV1 = CV2 = 1.
First, Figure 6 summarizes the evolution of best individual in the population (the one with the lowest value of fitness), indicating graphically the accomplishment of specifications along the generations. Each design objective is presented by a row in the diagram, while the best individual for
each generation appears in each column. The grey level of
position (i, j) in the image indicates the quality of the fitting
to the ith specification of the best individual for the jth generation. The grey level represents linearly the relative excess
over the restriction (no excess is presented as white, 100%
or higher excess as black), which is the partial cost function
related with this constraint. Therefore, a completely white
column means that the optimization process has found a
set of parameters able to fulfil all design restrictions, while
a complete white row means that all best individuals in this
optimization exercise are able to fulfil the specification for
776

Longitudinal error (m)
350
500
450
300
400
250
350
200
300
250
150
200
100
150
50
0
100
0
50
100
150
200
Time
250
300
350
Groundspeed error (m)
50
50
100
200
Time
250
300
350
Heading error (m)
30
150
25
6
20
5
15
4
3
10
2
5
1
0
0
50
100
150
200
Time
250
300
350
50
100
150
200
Time
250
300
350
Figure 8: Performance and ARTAS specifications for scenario 13.
this magnitude and situation. Below, the fitness function

computed from the whole set of partial costs as indicated in
Section 4 is plotted. This kind of figure serves not only to
see the convergence of the optimization process graphically,
but also to see the most demanding performance criteria to
be accomplished and to compare the suitability of a predefined tracking scheme (with some free design parameters)
for a certain tracking problem. Applying exactly the same
proposed methodology, we could have performed the optimization exercise with an alternative IMM structure, or even
with a dierent tracking technique with open design parameters, and compared after designing the process its capabilities
against specifications.
As it can be seen, the optimization process makes the
overall figure lighter from the initial generations (left) to
the end of the optimization (right), achieving a trade-o
point to accomplish as many specifications as possible. The
highest improvement is carried out in the first 80 generations, with very slight modifications from that point until the
end. The rows with a darker profile indicate higher diculty

to attain that specification together with the rest. So, scenarios 12 and 13, corresponding to specifications 133156,
present the worst performance after the optimization. The
specific performance values and ARTAS bounding masks for
these scenarios, corresponding to transversal maneuvers at
215 NM, v = 300 m/2, a = 2.5 m/s2 , (scenario 12) and at
65 NM, v = 150 m/2, a = 6.0 m/s2 , (scenario 13), are indicated in Figures 7 and 8. The magnitudes with worst performance are the transversal and heading errors (peak values) during transversal maneuvers. The peak value of heading error is the globally worst figure in the set, more than
100% over specification. Besides, as it can be seen, the convergence error values for some of the magnitudes in these
scenarios are practically tangent to specifications, indicating that the optimization process has eectively considered
all of them to arrive to the final trade-o solution. So,
this method selects the parameters adapting system behavior to the bounding mask. This is apparent not only for the
777
3.5
3
2.5
2
1.5
Excess over specifications
50
100
sol 2
1.5
1
150
sol 1
sol 5
0.5
0
200
0.5
1.5
250
Fitness
1
2.1
2.05
2
1.95
1.9
1.85
1.8
1.75
1.7
1.65
6
Runs
0.5
0.5
Figure 10: Fitness landscape for linear combinations of three solutions.
10
1.5
sol 2
0.5
10
Runs
sol 5
sol 1
Figure 9: Evolution of fitness and performance in each specific objective.
0.5

x = x
1 + x2 x1 + x5 x1 .
1.5
0.5
0.5
=0
1.82
1.8
1.78
1.76
Fitness
presented scenarios with worst cases but for all design scenarios as well.
Dierent runs of the global optimization process (using
dierent random seeds to generate individuals in the initial
population) were carried out to analyze the consistency of
the solutions obtained. The results of ten independent runs
are indicated in Figure 9, presenting only the best individual
in population after optimization (instead of the whole evolution process) and the final values of fitness achieved.
As it can be seen, dierent runs led to solutions quite consistent in terms of overall fitness and whose specifications are
presenting problems to the filter (always those in scenarios
12 and 13). However, the specific vector solutions found after optimization in each run had significant dierences, indicating that fitness function probably has a multimodal landscape, even after having selected a particular set of weighting
factors among specifications, = 1.
Since it is not possible to represent fitness landscape with
seven dimensions, the following analysis was carried out. The
three solutions with closest fitness values, resulting from runs
1, 2, and 5, were selected to be combined and to generate a
grid of linear combinations (convex hull) as follows:
1.74
1.72
sol 1
sol 2
1.7
1.68
1.66
1.64
0.4 0.2
0.2
0.4
0.6
0.8
1.2
1.4
Figure 11: Fitness landscape projection over horizontal plane and

= 0.
(8)
The fitness landscape for a grid with , varying in the interval [1.5, 0.5], in steps 0.1 units, is indicated in Figure 10. It
can be seen that the fitness is practically flat over the particular region of search space represented by linear combinations
778

=0
=1
2.5
2.4
2.8
2.3
2.2
2.4
Fitness
Fitness
2.6
2.2
2.1
2
1.9
1.8
1.8
1.6
0.4 0.2
1.7
sol 1
sol 5
sol 2
sol 5
0
0.2
0.4
0.6
0.8
1.2
1.4
1.6
0.5
0.5
1.5
Figure 12: Fitness landscape projection over plane and = 0, = 1 .
(really, there are much more solutions with similar fitness

values that are presented in Figure 10; they are only particular cases of linear combinations). The particular solutions
1, 2, and 5 correspond to the points (0, 0), (1, 0), and (0, 1) in
plane. In Figure 11, the projection on plane is presented
with grey levels, where the feasible region in plane can
be clearly separated. All solutions within the convex combinations of solutions , in [0, 1] are feasible. Besides, the
2D graphs corresponding to the paths connecting all pairs
of solutions are presented in Figures 11 and 12. As it can
be seen, the solutions found by algorithm are eectively local minima of fitness function in spite of the fact that the
function is almost flat in this region of convex linear combinations. This shows that the algorithm is capable of finding
appropriate solutions, and confirms the fact that we have a
multimodal function even after having combined the multiple restrictions in a scalar function. Dierent runs arrived
to dierent local minima in a region where the relative difference between minima can be practically neglected, so all
solutions can be taken as good design points for the adopted
criterion. The algorithm was carried out with dierent criteria (for instance, the penalty of RMSE peak values being ten
times higher than convergence RMSE), achieving results consistent with the preferences: all specifications with the highest
priority were first accomplished, leading to higher errors in
the other specifications.
6.
CONCLUSION
In this paper, we have described a methodology based on ES

for the design of IMM-tracker techniques to accomplish a
considerably large set of predefined specifications.
An exhaustive set of test scenarios with performance
specifications for each and a specific IMM structure with
open parameters are the input to solver. The procedure may
be summarized as performing an optimization over the pa-
rameters space, using ES, defining as the fitness function one

combination of partial excesses over specifications that takes
into account some knowledge about the problem in the form
described in Section 4. This fitness function summarizes the
attainment of all interest accuracy statistics for the dierent
interest times (steady state, start and end of maneuvers, etc.)
in all design scenarios. The evaluation involved the costly
Monte Carlo simulation, as specified by ARTAS, to calculate accuracy statistics, although the methodology is open for
the inclusion of other possible evaluation methods for IMM
tracking filters, such as the one described in [9].
This method has been successfully used in a monoradar
application, leading to a significant improvement over previous nonsystematic approaches for the same problem. Even
more, the form of fitness function described serves as a
method for relaxing constraints: those more important for
us are provided a higher weight in (6), and those not so important a lower weight.
REFERENCES
[1] H. A. Blom and Y. Bar-Shalom, The interacting multiple
model algorithm for systems with Markovian switching coecients, IEEE Trans. Automatic Control, vol. 33, no. 8, pp.
780783, 1988.
[2] EUROCONTROL, Functional and performance specification of ARTAS. Version 2.6, http://www.eurocontrol.int/
artas/public system support/online doc request/online doc
request summary.htm.
[3] Y. Bar-Shalom and X. R. Li, Multitarget-Multisensor Tracking: Principles and Techniques edited by Y. Bar-Shalom, YBS
Publishing, Danvers, Mass, USA, 1995.
[4] N. Nabaa and R. H. Bishop, Validation and comparison of
coordinated turn aircraft maneuver models, IEEE Trans. on
Aerospace and Electronics Systems, vol. 36, no. 1, pp. 250259,
2000.
[5] K. Kastella and M. Biscuso, Tracking algorithms for air trac
control applications, Air Trac Control Quarterly, vol. 3, no.
1, pp. 1943, 1995.

[6] H. P. Schwefel, Numerical Optimisation of Computer Models,
John Wiley & Sons, New York, NY, USA, 1981.
[7] I. Rechenberg, Evolution strategy: Natures way of optimization, in Optimization: Methods and Applications, Possibilities
and Limitations, H. W. Bergmann, Ed., Lecture Notes in Engineering, pp. 106126, Springer, Berlin, Germany, 1989.
[8] T. Back, Evolutionary Algorithms in Theory and Practice, Oxford University Press, New York, NY, USA, 1996.
[9] D. H. Wolpert and W. G. Macready, No-free-lunch theorems
for optimization, IEEE Trans. on Evolutionary Computation,
vol. 1, no. 1, pp. 6782, 1997.
[10] K. Ohkura, Y. Matsumura, and K. Ueda, Robust evolution
strategies, Applied Intelligence, vol. 15, no. 3, pp. 153169,
2001.
[11] K. Deb, Evolutionary algorithms for multi-criterion optimization in engineering design, in Evolutionary Algorithms in
Engineering and Computer Science, John Wiley & Sons, Chichester, UK, 1999, Chapter 8.
[12] E. Zitzler, K. Deb, and L. Thiele, Comparison of multiobjective evolutionary algorithms: Empirical results, Evolutionary
Computation, vol. 8, no. 2, pp. 173195, 2000.
Garca Herrero received his Master
Jesus
degree in telecommunication engineering
from Universidad Politecnica de Madrid
(UPM) in 1996 and his Ph.D. degree from
the same university in 2001. He has been
working as a Lecturer at the Department of
Computer Science, Universidad Carlos III
de Madrid, since 2000. There, he is also integrated in the Systems, Complex and Adaptive Laboratory, involved in artificial intelligence applications. His main interests are radar data processing,
navigation, and air trac management, with special stress on data
fusion for airport environments. He has also worked in the Signal
Processing and Simulation Group of UPM since 1995, participating in several national and European research projects related to air
trac control.
Juan A. Besada Portas received his Master degree in telecommunication engineering from Universidad Politecnica de Madrid
the same university in 2001. He has worked
in the Signal Processing and Simulation
Group of the same university since 1995,
participating in several national and European projects related to air trac control.
He is currently an Associate Professor at
Universidad Politecnica de Madrid (UPM). His main interests are
air trac control, navigation, and data fusion.
received his
Antonio Berlanga de Jesus
B.S. degree in physics from Universidad
Autonoma,
Madrid, Spain in 1995, and his
Ph.D. degree in computer engineering from
Universidad Carlos III de Madrid in 2000.
Since 2002, he has been there as an Assistant Professor of automata theory and programming language translation. His main
research topics are evolutionary computation applications and network optimization
using soft computing.
779
Jose M. Molina Lopez

received his Master degree in telecommunication engineering from Universidad Politecnica de Madrid
the same university in 1997. He is an Associate Professor at Universidad Carlos III de
Madrid. His current research focuses on the
application of soft computing techniques
(NN, evolutionary computation, fuzzy logic
and multiagent systems) to radar data processing, navigation, and air trac management. He joined the
Computer Science Department of Universidad Carlos III de
Madrid in 1993, being enrolled in the Systems, Complex, and
Adaptive Laboratory. He has also worked in the Signal Processing
and Simulation Group of UPM since 1992, participating in several
national and European projects related to air trac control. He is
the author of up to 10 journal papers and 70 conference papers.
Gonzalo de Miguel Vela received his
telecommunication engineering degree in
1989 and his Ph.D. degree in 1994 from
Universidad Politecnica de Madrid. He is
currently a Professor in the Department
of Signals, Systems, and Radiocommunications of the same university and is
a member of the Data Processing and
Simulation Research Group at the Telecommunication School. His fields of interest
and activity are radar signal processing and data processing for air
trac control applications.
Jose R. Casar Corredera received his graduate degree in telecommunications engineering in 1981 and his Ph.D. degree in 1983
from the Universidad Politecnica de Madrid
(UPM). He is a Full Professor in the Department of Signals, Systems, and Radiocommunications of UPM. At the present time,
he is Adjunct to the Rector for Strategic Programs and Head of the Signal and Data Processing Group at the same university. His research interests include radar technologies, signal and data processing, multisensory fusion, and image analysis both for civil and defence applications. During 1993, he was Vice Dean for Studies and
Research at the Telecommunications Engineering School of UPM.
During 1995, he was Deputy Vice President for Research at UPM
and from 1996 to February 2000 Vice President for Research at
UPM.

Tuning Range Image Segmentation by Genetic

Algorithm
Gianluca Pignalberi
Dipartimento di Informatica, Università di Roma La Sapienza, Via Salaria, 113 00198 Roma, Italy
Email: pignalbe@dsi.uniroma1.it
Rita Cucchiara
Dipartimento di Ingegneria dellInformazione, Università di Modena e Reggio Emilia, Via Vignolese, 905 41100 Modena, Italy
Email: rita.cucchiara@unimo.it
Luigi Cinque
Email: cinque@dsi.uniroma1.it
Stefano Levialdi
Email: levialdi@dsi.uniroma1.it
Received 1 July 2002 and in revised form 19 November 2002
Several range image segmentation algorithms have been proposed, each one to be tuned by a number of parameters in order
to provide accurate results on a given class of images. Segmentation parameters are generally aected by the type of surfaces
(e.g., planar versus curved) and the nature of the acquisition system (e.g., laser range finders or structured light scanners). It is
impossible to answer the question, which is the best set of parameters given a range image within a class and a range segmentation
algorithm? Systems proposing such a parameter optimization are often based either on careful selection or on solution spacepartitioning methods. Their main drawback is that they have to limit their search to a subset of the solution space to provide an
answer in acceptable time. In order to provide a dierent automated method to search a larger solution space, and possibly to
answer more eectively the above question, we propose a tuning system based on genetic algorithms. A complete set of tests was
performed over a range of dierent images and with dierent segmentation algorithms. Our system provided a particularly high
degree of eectiveness in terms of segmentation quality and search time.
Keywords and phrases: range images, segmentation, genetic algorithms.
1. INTRODUCTION
Image segmentation problems can be approached with several solution methods. The range image segmentation subfield has been addressed in dierent ways. But, since an algorithm should work correctly for a large number of images in
a class, such a program is normally characterized by a high
number of tuning parameters in order to obtain a correct, or
at least satisfactory, segmentation.
Usually the correct set of parameters is given by the developers of the segmentation algorithm, and it is expected
to give satisfactory segmentations for the images in the class
used to tune the parameters. But it is possible that, given
changing input image class, the results are not satisfactory.
To avoid exhaustive test tuning, an expert system to tune parameters should be proposed. In this way, it should be pos-
sible to easily direct the chosen segmentation algorithm to

work correctly with a chosen class of images.
Several expert systems have been proposed by other
teams. We can quote [1] that performs the tuning of a color
image segmentation algorithm by a genetic algorithm (GA).
The same technique can be applied to range segmentation algorithms. Up till now, only techniques that partition the parameter space and work on a successive approximation have
been used (such as in [2, 3, 4, 5]). Such techniques obtain results similar to those provided by the algorithm teams tuning.
In this paper, we propose a tuning system based on GAs.
To prove the validity of this method, we will show results
obtained using well-tuned segmentation algorithms of range
images (in particular the ones proposed at the University of
Tuning Range Image Segmentation by Genetic Algorithm

Bern and University of South Florida). Genetic solutions are
evaluated according to a fitness function that accounts for
dierent types of errors such as under/oversegmentation or
miss-segmentation.
The paper is organized as follows. In Section 2, we summarize the related works. In Section 3, we describe in detail
our approach. In Section 4, we show the experimental results,
while in Section 5, we present our conclusions.
2.
RELATED WORKS
2.1. Range image segmentation

Range images are colored according to the distance from the
sensor that scans the image. In fact, each pixel in a range image indicates the value of the distance from the sensor to the
foreground object point. Image segmentation is the refinement of an image into patches corresponding to the represented regions. So the range image segmentation algorithm
aims at partitioning and labeling range images into surface
patches that correspond to surfaces of 3D objects.
Surface segmentation is still a challenging problem. Currently, many dierent approaches have been proposed. The
known algorithms devoted to range segmentation may be
subdivided into at least three broad categories [6]:
(1) those based on a region-growing strategy,
(2) those based on clustering method,
(3) those based on edge detection and completion followed by surface filling.
Many algorithms addressing range segmentation have
been proposed. In [6], there is a complete analysis of four
segmentation algorithmsfrom the University of South
Florida (USF), the University of Bern (UB), the Washington State University (WSU), and the University of Edinburgh
(UE). The authors show that a careful parameter tuning has
to be performed according to the chosen segmentation algorithm and image set. Such algorithms are based on the
above methods, and show dierent performances and results
in terms of segmentation quality and segmentation time.
Jiang and Bunke [7] describe an evolution of the segmentation algorithm built at the University of Bern and in
[5], the same segmentation algorithm is used for other tests.
Recently, a dierent segmentation algorithm was presented,
based on the scan-line grouping technique [8], but using a
region-growing strategy and showing good segmentation results and a quasi-real-time computation capability. Zhang
et al. [9] presented two algorithms, both edge based, segmenting noisy range images. By these algorithms, the authors investigated the use of the intensity edge maps (IEMs)
in noisy range image segmentation, and the results compared
against the corresponding ones are obtained without using
IEMs. Such algorithms use watershed and scan-line grouping techniques. Chang and Park [10] proposed a segmentation of range images based on the fusion of range and
intensity images, and the estimation of parameters for surface patches representation is performed by a least-trimmed
squares (LTS) method. Baccar et al. [11] describe a method
781
to extract, via classification, edges from noisy range images. Several algorithms (particularly color segmentation algorithms) are described or summarized in [12].
Parameters tuning is still a main task, and a possible solution is proposed. A dierent method to tune set parameters
is given by Min et al. in [2, 3, 4]. The main drawback seems
to be that a limited subset of the complete solution space is
allowed to be explored, but exposes the method to the possibility of missing the global optimum or a good enough local
optimum. But such a method is fast and ecient enough to
represent a fine-tuning step: given a set of rough local suboptima, the algorithm proposed in [2] could quickly explore a
limited space around these suboptima to reach, if they exist,
local optima.
In [6], for the first time, an objective performance comparison of range segmentation algorithms has been proposed. Further results on such comparison have been proposed in [3, 4, 13, 14]. Another comparison has been presented in [15], where another range segmentation algorithm
is proposed. This is based on a robust clustering method
(used also for other tasks). But the need for tuning algorithm
parameters is still present.
2.2.
Genetic algorithms and their application

to image segmentation
GA is a well-known spread technique for exploring in parallel a solution space by encoding the concept of evolution in
the algorithmic search: from a population of individuals representing possible problem solutions, evolution is carried out
by means of selection and reproduction of new solutions. Basic principles of GAs are now well known. Quoted references
are the books of Goldberg [16] and Michalewicz [17]; a survey is presented in [18], while a detailed explanation of a basic GA for solving NP-hard optimization problem, presented
by Bhanu et al., can be found in [1].
Many GA-driven segmentation algorithms have been
proposed in the literature; in particular, an interesting solution was presented by Yu et al. [19], an algorithm that can
segment and reconstruct range images via a method called
RESC (RESidual Consensus). Chun and Yang [20] presented
an intensity image segmentation by a GA split-and-merge exploiting strategies; and Andrey and Tarroux [21] proposed
an algorithm which can segment intensity images by including production rules in the chromosome, that is, a data string
representing all the possible features present in a population
member. Methods for segmenting textured images are described by Yoshimura and Oe [22] and Tseng and Lai [23].
The first one adopts a small region-representing chromosome, while the second one uses GAs to improve the iterated conditional modes (ICM) algorithm [24]. Cagnoni et al.
[25] presented a GA based on a small set of manually traced
contours of the structure of interest (anatomical structures
in three-dimensional medical images). The method combines the good trade-o between simplicity and versatility
oered by polynomial filters with the regularization properties that characterize elastic-contour models. Andrey [26]
proposed another interesting work, in which the image to be
782
segmented is considered as an artificial environment. In it,

regions with dierent characteristics are presented as a set of
ecological niches. A GA is then used to evolve a population
distributed all over this environment. The GA-driven evolution leads distinct species to spread over dierent niches.
Consequently, the distribution of the various species at the
end of the run unravels the location of the homogeneous regions on the original image. The method has been called selectionist relaxation because the segmentation emerges as a
by-product of a relaxation process [27] mainly driven by selection.
As previously stated, the algorithm presented in [1] tunes
a color-image segmentation algorithm, namely, phoenix
[28], by a chromosome formed by the program parameters,
and not formed by image characteristics as in [19, 20, 21].
A complete survey on GA used in image processing is that
one compiled by Alander [29].
3.
GASE: GENETIC ALGORITHM SEGMENTATION

ENVIRONMENT
Using the same rationale as in [1], we adopted a GA for tuning the set of parameters of a range segmentation algorithm.
Dierent approaches to the tuning of parameters could
be represented by evolutionary programming (EP) and evolution strategy (ES).
The first one places emphasis on the behavioral linkage
between parents and their osprings (the solutions). Each
solution is replicated into a new population and is mutated
according to a distribution of mutation types. Each ospring
solution is assessed by computing its fitness. Similarly, the
second one tries random changes in the parameters defining
the solution, following the example of natural mutations.
Like both ES and EP, GA is a useful method of optimization when other techniques, such as gradient descent or direct analytical discovery, are not possible. Combinatoric and
real-valued function optimization in which the optimization
surface or fitness landscape is rugged, possessing many locally
optimal solutions, are well suited for GA.
We chose GA because it is a well-tested method in image
segmentation and a good starting point to explore the evolutionary framework.
Because of the universal model, we have the possibility of changing the segmentation algorithm with few consequent changes in the GA code. These changes mainly involve
the chromosome composition and the generation definition.
The fitness evaluation has been modeled for the problem of
range segmentation and can be kept constant as the reproduction model. This is one of the features of our proposal
that we called GASE or genetic algorithm segmentation environment (introduced as GASP in [30]).
The main goal of GASE is to suggest a signature for a class
of images, that is, the best fitted set of parameters performing
the optimal segmentation. In this way, when our system finds
a good segmentation for an image or for a particular surface,
we can say that the same parameters will work correctly for
the same class of images or for the same class of surfaces (i.e.,
all the surfaces presenting a big curvature radius).
3.1.
The GASE architecture
In Figure 1, we show the architecture of our system. Following the block diagram, we see that an input image Ii is
first segmented by a program s (range segmentation algorithm) with a parameter set sj , producing a new image having labeled surface patches Misj . All such segmented images
are stored in a database that we call phenotype repository.
Briefly, we may write
Misj = segmentation s, sj , Ii .
(1)
The quality of the segmentation process may be assessed by

means of the so-called fitness evaluation (in block geneticbased learning) computing a score Fisj by comparing the
segmented image Misj with the ground truth segmented image Gi . We assume that our fitness function evaluates a
cost, therefore positively valued (or zero valued if the segmented image coincides exactly with the ground truth one).
Thus
Fisj = fitness Misj , Gi ,
Fisj 0.
(2)
This process is fulfilled for all available images with different parameter sets. The sets that produce the best results
(called w ) are stored in the so-called final genotype repository (if fitness function is under a given threshold). Once
the score is assigned, a tuple Pi j containing the genotype,
the score value, the phenotype identifier, and the generation (sj , Fisj , i j, k) is written in a database called evaluation
repository. The genetic computation selects two individuals to be coupled among the living ones (mating individuals
selection); these genotypes are processed by the crossover
block that outputs one or more osprings that could be mutated. The generated individuals will be the new genotypes
sj in the next generation step.
At the end of a generation, a to-be-deleted individuals
selection is performed. The decision on which individuals
are to be erased from the evaluation repository is made by
fixing a killing probability pk depending on the fitness and
the age of the individuals (their k value). If an individual has
a score greater than pk , the solution it represents will be no
longer considered. In this way, we have a limited number of
evaluated points in the solution space.
3.2.
GASE features
When building a GA, some features have to be specifically

designed. Among others, we mention the fitness function,
the chromosome, described in Sections 3.3 and 3.4, and the
crossover.
The fitness function is a heuristic function that indicates
to the GA whether an individual fits or not the environment.
The chromosome is the data structure that contains the characters of the individuals. The crossover is the method that indicates how parents characteristics are inherited by children.
For this work, we used modified versions of multiple point
crossover [31] and uniform crossover [32], as described in
[30].
783
sj
Age counter k
GASE
Segmentation
algorithm
s
Crossover
sj
Misj
Ii
Mutation
To-be-deleted
individuals
selection
Reproduction
Phenotype
repository
i , i
Pin
Mating
individuals
selection
Pi , Pin
Gi
Fitness
evaluation
w
Training set
& prototype
repository
Fisj
Genetic evolution
Evaluation
repository
Pi j = (2j , Fisj , i j, k)
Genetic-based learning
Final genotype repository

(best parameters)
Figure 1: GA architecture for range image segmentation.
3.3. Fitness function

The most critical step in the genetic evolution process is
the definition of a reliable fitness function which ensures
monotonousness with respect to the improvement provided
by changing the segmentation parameters. The fitness function could be used for comparing both dierent algorithms
and dierent parameter sets within the same algorithm. In
[6] the problem of comparing range segmentation algorithms has been thoroughly analyzed, nevertheless the authors evaluations take into account a number of separate
performance figures and no global merit value is provided.
More precisely, the authors consider five figures that are
functions of a precision percentage:
(1)
(2)
(3)
(4)
(5)
correct segmentation,
oversegmentation,
undersegmentation,
miss-segmentation,
noise segmentation.
Conversely, we are in the need of a single value which will

then guide our feedback loop within the optimization process, and therefore, we define a unique performance value
specifically accounting for all points. In [33] and in [34] a
function assigning a scalar to a segmentation is used. Particularly in [34], that function is the probability error between the ground truth and the machine-segmented image.
But such a way of assessing fitness is judged not suitable [6].
This means that a more robust way to have a scalar could
be to order a vector of properties. Of course the ordering of
vectors is not straightforward without using particular techniques; one of them could be to adopt a weighted sum of the
components.
We define the fitness function as a weighted sum of a
number of components:
F = w1 C + w2 Hu + w3 Ho + w4 U :
wi = 1,
(3)
i=1
where w1 , w2 , w3 , and w4 are tuned to weigh dierently the

single components.
The fitness takes into account two levels of errors (and
therefore is a cost to be minimized); the former is a measure
at pixel level computed with a pixel-by-pixel comparison, the
latter is a measure at surface level considering the number of
computed surfaces. At the pixel level, C is the cost associated
with erroneously segmented pixels and U accounts for unsegmented pixels. At the surface levels, we add two factors
(handicaps), one due to undersegmentation (Hu ) and one
due to oversegmentation (Ho ).
Let G be the ground truth image, having NG regions
called RGi composed by PGi pixels, i = 1, . . . , NG , and let MS
be the machine-segmented image, having NM regions called
RM j composed by PM j pixels, j = 1, . . . , NM . We define the
overlap map O so that
Oi j = # overlapping pixel i j of RGi and RM j ,
(4)
where #() indicates the number of (). The number of pixels

with the same coordinates in the two regions is the value Oi j .
784
The expression (4) could be written as Oi j = RGi RM j . It

is straightforward that if there is no overlap between the two
regions, Oi j = 0; while in case of complete overlap, Oi j =
PGi = PM j .
Starting from Oi j , we search the index x j for all RM j : x j =
argmaxNi=G1 (Oi j ) to compute the cost C:
NM
PGx j Ox j j
j =1
C=
NM
(5)
In other words, C should be a kind of distance between the

real and the ideal segmentation at pixel level.
The term U accounts for the unlabeled pixels, that is,
those pixels that at the end of the process do not belong to
any region (this holds only for the USF segmentation algorithm since the UB segmentation algorithm allocates all unlabeled pixels to the background region):
U=
NM

i=1
Pi
NG

Oi j .
(6)
j =1
Then we can create another (boolean) matching map with

entries mi j so that
mi j =
4.
if i = argmaxNj =M1 Oi j ,
(7)
0 otherwise.
The handicap Hu is accounting for the number of undersegmented regions (those which appear in the resulting image as a whole whilst separated in the ground truth image):
(
Hu = k # RM j :
NG

mi j > 1, j = 1, . . . , NM .
(8)
i=1
In fact, in each row i of the matching map, only one entry

is set to 1, while more entries in a column can be set to 1
if undersegmentation occurs and a segmented region covers
more ground truth regions.
Finally, Ho is a handicap accounting for the number of
oversegmented regions (those which appear in ground truth
image as a whole whilst split in the resulting image):
(
Ho = k # RM j :
NG

To simplify the generation of new solution by a correct

chromosome manipulation, we should use a binary coding, but since some genes (i.e., parameters) could assume
real values, this coding is not sucient. So we decided to
adopt an extended logical binary coding in order to represent real values with a fixed-point code (with a defined number of decimals). Thus we define the symbol set as {0, 1, dot}
to allow a representation (of fixed but arbitrary precision)
of the decimal of the number. The choice of a fixed precision could seem wrong, but we can consider that, beyond
a certain precision, segmentation algorithm performances
are not aected. We could have used a floating-point representation of the chromosome, as suggested in [36], but in
the case we studied, a fixed-point representation seems to
be sucient. The binary strings are formed by the juxtaposition of BCD-coded genes, memory consuming but giving accuracy to and from decimal conversion. The choice of
extending the symbols set including dot was a help for visual inspection of the created population databases (listed in
Figure 1).
Our chromosome contains all the parameters (their
meanings are listed in Tables 1 and 2) of the chosen segmentation algorithm. In this way, the solution spaces considered
are n-dimensional with n = 5 for USF and n = 10 for UB.
mi j = 0, j = 1, . . . , NM .
(9)
i=1
The handicaps Ho and Hu are both multiplied by a constant k just to enlarge the variability range.
Some results about the eectiveness of the adopted fitness
function have been presented in [35].
3.4. Coding the chromosomes
One of the main tasks in GASE was to code the chromosome,
that is, to code the parameter set for a given segmentation
algorithm.
EXPERIMENTAL RESULTS
Experiments carried out on GASE are used as a benchmark of the Michigan State University/Washington State
University synthetic image database (that we will refer
to as MSU/WSU database, http://sampl.eng.ohio-state.edu/
sampl/data/3DDB/RID/index.htm) and as a subset of the
University of Bern real database (referred to as ABW). The
tests performed are very time consuming since each segmentation process is iterated for a single experiment many times
(i.e., for each individual of the solution population and for
each generation).
Since we tested our GA with both a fixed and random
number of children crossover, according to [30], we have to
use an alternative definition of generation. The term generation in GAs is often used as a synonym of the iteration
step and is related to the process of creating a new solution. In our case, a generation step is given by the results
obtained in a fixed time slice. In this manner, we can establish a time slice in function of the reference workstation; for
instance, with a standard PC (AMD Duron 700 MHz) running Linux OS, we could define the time slice as one minute
of computation. In order to compare the ecacy and eciency of results, we will define a convergence trend maximum time to get the optimal solution in a given Max G
generations.
4.1.
Tuning the UB algorithm
The first experiment was the tuning of the UB segmentation algorithm [7]. This algorithm initially tries to detect
the edges (jump and crease [37]) of the segmenting image
by computing the scan lines. After finding the candidates
for area borders, it accomplishes an edge-filling process. This
785
Table 1: USF parameters: meaning and variability range.

Name
Name within code
Range
Meaning
N
Tpoint
Tperp
Tangle
Tarea
WINSIZE
MAXPTDIST
MAXPERPDIST
MAXANGLE
MINREGPIX
212
0
0
0.0180.0
0
Window radius in which normals are calculated

Maximum point-to-point distance between pixel and 4-connected neighbor in region
Maximum perpendicular distance between pixel and plane equation of grown region
Maximum angle between normal of pixel and normal of grown region
Maximum number of pixels to accept or reject a region
Table 2: UB parameters: meaning and variability range.

Variable Name
Meaning
Variable type
Range
Th toleran
Curve segment accuracy
float
0.515.0
Th length
Th jump
Minimum curve segment length

Minimum distance for jump edges
int
float
3
1.020.0
Th crease
Th area
Th morph
Minimum angular distance for crease edges

Minimum number of pixels for a valid surface
Number of postprocessing morphological operators
float
int
float
0.0180.0
0
1.03.0
Th PRMSE
Th Pavgerr
Plane region acceptance (RMSE)

Plane region acceptance (average error)
float
float
0.110.0
0.0510.0
Th CRMSE
Th Cavgerr
Curve region acceptance (RMSE)

Curve region acceptance (average error)
float
float
0.110.0
0.0510.0
The range is limited according to the observed lack of meaning of greater values when segmenting MSU/WSU images, so the shown limits are less than
possible.
Fixed by UB task force; a range from 2 to 4 is allowed.
segmentation algorithm is capable of segmenting curved surfaces and the available version [38] can segment images of the
GRF2-K2T database (named after the brand and model of
the structured light scanner used). We used a version, slightly
modified at the University of Modena, which is able to segment also synthetic images of the MSU/WSU database. A set
of 35 images was chosen and a tuning task as in [6] was executed.
While the tuning done should provide very good results,
it is our opinion that a training set should not be too large.
We then chose a subset of 6 images as our training set. This
set was input to GASE, and the resulting parameters set were
used to segment the test set (formed by the remaining 29 images) and to find the most suitable set.
We fixed our generation in 1 minute and the maximum
number of generations in 30, that is to say, about 30 minutes
of computation for every image of the training set. It took a
total of about 3 hours to obtain 6 possible solutions and to
select the most suitable for the test set. During this time our
algorithm performed about 10000 segmentations on the images. An exhaustive search should explore all the enormous
space of solution (the space has 10 dimensions, and one parameter potentially ranges from 0 to ) and all the instances
of the test set. In our case, the exhaustive search was substi-
tuted by the GA-based search. Nevertheless, it is critical to

test an individual on all images and measure the fitness as a
function of the goodness over the whole training set.
As an acceptable approximation, to save computational
time, we evaluated the fitness of every individual, applied on
a single image at a time. We assumed that, thanks to the genetic evolution, when the individual genotype becomes common in the population, it will be tested on dierent images.
At the end, the best scored individuals are tested on all images
of the training set and the one that outperforms the others in
average is selected as the best.
In Table 3, we show the parameters used for this test.
With original opt. val. we refer to the parameters tuned by
the algorithm author, while with GASE opt. val. we refer to
those tuned by GASE. In Table 4, we show the average scores
obtained in this test. Although the improvement could seem
poor, it is not because of the presence of images with very
dierent characteristics, which were not considered in the
training set. As a matter of fact, the fitness improvement is
in most of the cases of one or more units (see Figures 2 and
3 where original and GASE opt. val. are compared). The best
improvement was of 11.26 points, while in one case only the
GASE optimization generated the worst result with respect
to the manual selection.
786
Table 3: Parameters sets for modified UB as tuned by the algorithm

authors and by GASE.
Parameter
Original opt. val.
GASE opt. val.
7.5
3.61
10.0
30.0
4.55
36.78
Th SegmToler
Th Jump
Th Crease
Th PRMSE
Th PAvErr
Th CRMSE
1.11
1.07
1.11
0.51
0.21
0.57
Th CAvErr
Th PostprFact
Th SegmLen
1.09
2.0
3
0.45
1.79
2
Th RegArea
100
(a) Column1-3: range.
(b) Column1-3: ground truth.
Table 4: Average fitness values as allowed by original opt. val. and

by GASE opt. val.
Parameters set
Original
GASE
Average fitness
15.96
15.04
(c) Column1-3: original opt.

val. Fitness = 8.42.
(d) Column1-3: GASE opt.

Figure 3: Improvement of obtained segmentation for column1-3.
Table 5: Parameters sets for USF as tuned by the algorithm authors

and by GASE.
Parameter
(a) Adapter-1: range.
(b) Adapter-1: ground truth.
Original opt. val.
GASE opt. val.
WINSIZE
MAXPTDIST
10
12.0
9
13.2
MAXPERPDIST
MAXANGLE
MINREGPIX
4.0
25.0
500
5.3
11.45
482
4.2.
(c) Adapter-1: original opt.

(d) Adapter-1: GASE opt. val.

Fitness = 3.91.
Figure 2: Improvement of obtained segmentation for adapter-1.
Tuning the USF algorithm
The second experiment was performed on the USF segmentation algorithm [6]. Based on a region growing strategy, it computes the normal vector for each pixel within a
parametric-sized window. After that first computation, it selects seed points on the basis of a reliability measure. From
these seed points, it accomplishes the region growing, aggregating surfaces until at least one of four parametric criteria
is met. This segmentation algorithm has been tuned using a
set of parameters proposed by its authors. As we can see in
[6], the given results are very impressive, so we knew how
dicult it will be to improve them. Nevertheless, we performed the following experiment: given the original training
787
Table 6: Average results of USF segmentation algorithm with original opt. val. and GASE opt. val. on 10 ABW images at 80% of compare
tolerance (we recall that tool measures segmentation algorithm performances with respect to a certain precision tolerance, ranging from 51
to 95%).
Parameters set GT regions
Original
GASE
20.1
20.1
Correct detection
Angle di. (std. dev.)
Oversegmentation
Undersegmentation
Missed
Noise
13.1
12.9
1.24 (0.96)
1.27 (0.99)
0.1
0.1
0.0
0.0
6.9
7.1
2.8
3.7
set (10 images of the ABW database), we chose an image

as our training set and the other 9 as the test set. Then we
compared the results on this subset to the corresponding former results on the same subset, using the comparison tool
presented in [6]. The comparison tool considers five types
of region classification: correct detection, oversegmentation,
undersegmentation, miss-segmentation, and noise segmentation. When all region classifications have been determined,
a metric describing the accuracy of the recovered geometry is
computed; any pair of regions R1 and R2 in the ground truth
image, representing adjacent faces of the same object, have
their angle An recorded in the truth data. If R1 and R2 are
classified as correct detections, the angle Am between the surface normals of their corresponding regions in the machinesegmented image is computed. Then |An Am | is computed
for every correct detection classification. The number of angle comparisons, the average error, and the standard deviation are reported, giving an indirect estimation of the accuracy of the recovered geometry of the correctly segmented
portion of image.
The set as tuned by GASE is in Table 5, and we refer to
as GASE opt. val. The same table also includes the parameters as tuned in [6] which are referred as original opt. val.
The results are not better than those presented in [6], but in
a limited amount of time (we fixed the search in 15 generations), we reached a good result considering that the solution
space was larger than that considered in [6]. Moreover, no
information is given about the time spent to select the solution space, while an average time can be easily determined
to explore the whole solution space to select the original
opt. val.
In Table 6, we present the results determined by the two
sets with a precision tolerance of 80% (see [6]). In Figure 4,
we show the plots corresponding to the experiment. The
comparison tool provides five error measures, in addition to
a measure of correctness. All these measures are related with
a tolerance percentage. Plots of Figures 4a, 4b, 4c, 4d, and
4e show the results on the training set of the original opt.
val. (curve labeled as HE) versus GASE opt. val. (with label
GA). The comparison is very interesting, especially considering that the heuristic selection was performed on a small
solution space and tuned on all 10 images, while the GASE
one, although optimized by GAs, was tuned on a single image only.
In particular, Figure 4a indicates that both parameter
sets achieve the same number of correct instances over the
training set, while Figures 4b and 4c demonstrate that, for
problems of over- and undersegmentation, GASE and orig-
inal opt. val. have an opposite behavior since GASE produces less undersegmentation errors but higher oversegmentation. Finally, the last two plots show that there is
no noticeable dierence in noise segmentation and misssegmentation.
5.
DISCUSSION AND CONCLUSIONS
The segmentation of range images is a challenging problem

both for the selection of the more appropriate algorithm (region growing, edge filling, clustering, etc.) and for the obtained accuracy. A variety of systems to perform this task
have been presented in the literature (we recall [6, 15]), and
all of them need an accurate parameters tuning, according to
the image characteristics.
A tool to compare results was proposed in [6], and it has
been used to address the parameters tuning (as in [2, 3, 4]),
using only one of the given measures. The tuning methods
are based either on careful selection or on solution spacepartitioning search which limits the dimensions of the solution space.
We proposed an automated search method, based on genetic algorithms, that allows us to search a large solution
space while requiring a manageable amount of computation
time (according to the chosen segmentation algorithm). To
address the search, we used a fitness function that combines
dierent measures given by the comparison tool (although
using a dierent source code). We thus implemented a system called GASE to test dierent segmentation algorithms,
namely, UB and USF.
We saw that for the UB, we obtained excellent results,
improving segmentation quality and the speed of segmentation. For the USF, we obtained reasonable results, similar to the one proposed by the authors, but without having any knowledge about the nature of the parameters. In
fact, GAs start from random values of the parameter set and
are able to reach a similar solution in relatively few generations. Finally, embedded in GASE and as a stand-alone
tool, an algorithm to robustly award a scalar value to a
segmentation was proposed.
We believe that this work provides the basis to design a
wizard (or expert system) helping human operators in segmenting images. Our final aim is to build an interactive system that, after an unsupervised training time, will help human operators in the task of obtaining good segmentations.
The expert system will provide the framework for the operator to decide the parameters to segment a single or a subset
of surfaces in a complex scene (as done in [39]).
788

ABW-structured light images
Average number of correct

instances
14
12
10
8
6
4
2
50
55
60
65 70
75
80
85
Compare tool tolerance (%)
0.45
Average number of oversegmentation
instances
16
90
95
100
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
50
55
HE
GA
90
95
100
95
100
(b) Average oversegmentations on 10 ABW images.
0.3
16
14
0.25
Average number of noise
instances
Average number of undersegmentation

instances
65 70 75 80 85
HE
GA
(a) Average correct detections on 10 ABW images.
0.2
0.15
0.1
0.05
0
50
60
12
10
8
6
4
2
55
60
65 70 75 80
85
90
95
100
0
50
55
HE
GA
60
65
70
75 80 85
90
HE
GA
(c) Average undersegmentations on 10 ABW images.
18
Average number of missed instances
(d) Average noise regions on 10 ABW images.
16
14
12
10
8
6
4
50
55
60
65
70
75
80
85
90
95
100
HE
GA
(e) Average missed regions on 10 ABW images.
Figure 4: Results, as measured by the comparison tool, obtained by the original opt. val. (labeled HE) and GASE opt. val. (labeled GA)
on 10 images of the ABW database.

REFERENCES
[1] B. Bhanu, S. Lee, and J. Ming, Adaptive image segmentation
using a genetic algorithm, IEEE Trans. Systems, Man, and
Cybernetics, vol. 25, no. 12, pp. 15431567, 1995.
[2] J. Min, M. W. Powell, and K. W. Bowyer, Progress in automated evaluation of curved surface range image segmentation, in Proc. IEEE International Conference on Pattern Recognition (ICPR 00), pp. 16441647, Barcelona, Spain, September 2000.
[3] J. Min, M. W. Powell, and K. W. Bowyer, Automated performance evaluation of range image segmentation, in IEEE
Workshop on Applications of Computer Vision, pp. 163168,
Palm Springs, Calif, USA, December 2000.
[4] J. Min, M. W. Powell, and K. W. Bowyer, Objective, automated performance benchmarking of region segmentation algorithms, in Proc. International Conf. on Computer Vision,
Barcelona, Spain, 2000.
[5] X. Jiang, An adaptive contour closure algorithm and its experimental evaluation, IEEE Trans. on Pattern Analysis and
Machine Intelligence, vol. 22, no. 11, pp. 12521265, 2000.
[6] A. Hoover, G. Jean-Baptiste, X. Jiang, et al., An experimental
comparison of range image segmentation algorithms, IEEE
Trans. on Pattern Analysis and Machine Intelligence, vol. 18, no.
7, pp. 673689, 1996.
[7] X. Jiang and H. Bunke, Edge detection in range images based
on scan line approximation, Computer Vision and Image Understanding, vol. 73, no. 2, pp. 183199, 1999.
[8] X. Jiang, H. Bunke, and U. Meier, High-level feature based
range image segmentation, Image and Vision Computing, vol.
18, no. 10, pp. 817822, 2000.
[9] Y. Zhang, Y. Sun, H. Sari-Sarraf, and M. A. Abidi, Impact of
intensity edge map on segmentation of noisy range images,
in Three-Dimensional Image Capture and Applications III, vol.
3958 of SPIE Proceedings, pp. 260269, San Jose, Calif, USA,
January 1990.
[10] I. S. Chang and R.-H. Park, Segmentation based on fusion of
range and intensity images using robust trimmed methods,
Pattern Recognition, vol. 34, no. 10, pp. 19511962, 2001.
[11] M. Baccar, L. A. Gee, and M. A. Abidi, Reliable location and
regression estimates with application to range image segmentation, Journal of Mathematical Imaging and Vision, vol. 11,
no. 3, pp. 195205, 1999.
[12] H. D. Cheng, X. Jiang, Y. Sun, and J. Wang, Color image
segmentation: advances and prospects, Pattern Recognition,
vol. 34, no. 12, pp. 22592281, 2001.
[13] X. Jiang, K. W. Bowyer, Y. Morioka, et al., Some further results of experimental comparison of range image segmentation algorithms, in Proc. IEEE International Conference on
Pattern Recognition (ICPR 00), vol. 4, pp. 877881, Barcelona,
Spain, September 2000.
[14] M. W. Powell, K. W. Bowyer, X. Jiang, and H. Bunke, Comparing curved-surface range image segmenters, in Proc. International Conf. on Computer Vision, pp. 286291, Bombay,
India, January 1998.
[15] H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with applications in computer vision, IEEE
Trans. on Pattern Analysis and Machine Intelligence, vol. 21, no.
5, pp. 450465, 1999.
[16] D. E. Goldberg, Genetic Algorithms in Search Optimization
and Machine Learning, Addison-Wesley, Reading, Mass, USA,
1989.
[17] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, New York, NY, USA, 2nd
edition, 1994.
789
[18] M. Srinivas and L. M. Patnaik, Genetic algorithms: a survey,
IEEE Computer, vol. 27, no. 6, pp. 1726, 1994.
[19] X. Yu, T. D. Bui, and A. Krzyzak, Robust estimation for range
image segmentation and reconstruction, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 530
538, 1994.
[20] D. N. Chun and H. S. Yang, Robust image segmentation using genetic algorithm with a fuzzy measure, Pattern Recognition, vol. 29, no. 7, pp. 11951211, 1996.
[21] P. Andrey and P. Tarroux, Unsupervised image segmentation
using a distributed genetic algorithm, Pattern Recognition,
vol. 27, no. 5, pp. 659673, 1994.
[22] M. Yoshimura and S. Oe, Evolutionary segmentation of texture image using genetic algorithms towards automatic decision of optimum number of segmentation areas, Pattern
Recognition, vol. 32, no. 12, pp. 20412054, 1999.
[23] D.-C. Tseng and C.-C. Lai, A genetic algorithm for MRFbased segmentation of multi-spectral textured images, Pattern Recognition Letters, vol. 20, no. 14, pp. 14991510, 1999.
[24] J. Besag, On the statistical analysis of dirty pictures, Journal
of the Royal Statistical Society, vol. 48, no. 3, pp. 259302, 1986.
[25] S. Cagnoni, A. B. Dobrzeniecki, R. Poli, and J. C. Yanch, Genetic algorithm-based interactive segmentation of 3D medical images, Image and Vision Computing, vol. 17, no. 12, pp.
881895, 1999.
[26] P. Andrey, Selectionist relaxation: genetic algorithms applied
to image segmentation, Image and Vision Computing, vol. 17,
no. 3-4, pp. 175187, 1999.
[27] L. S. Davis and A. Rosenfeld, Cooperating processes for lowlevel vision: a survey, Artificial Intelligence, vol. 17, no. 1-3,
pp. 245263, 1981.
[28] K. I. Laws, The Phoenix image segmentation system: description and evaluation, Tech. Rep. 289, SRI International,
Menlo Park, Calif, USA, December 1982.
[29] J. T. Alander, Indexed bibliography of genetic algorithms
in optics and image processing, Tech. Rep. 94-1-OPTICS,
Department of Information Technology and Production Economics, University of Vaasa, Vaasa, Finland, 2000.
[30] L. Cinque, R. Cucchiara, S. Levialdi, S. Martinz, and G. Pignalberi, Optimal range segmentation parameters through
genetic algorithms, in Proc. IEEE International Conference
on Pattern Recognition (ICPR 00), pp. 14741477, Barcelona,
Spain, September 2000.
[31] L. J. Eshelman, R. A. Caruana, and J. D. Schaer, Biases in
the crossover landscape, in Proc. 3rd International Conference
on Genetic Algorithms, J. D. Schaer, Ed., pp. 1019, Fairfax,
Va, USA, June 1989.
[32] G. Syswerda, Uniform crossover in genetic algorithms, in
Proc. 3rd International Conference on Genetic Algorithms, J. D.
Schaer, Ed., pp. 29, Fairfax, Va, USA, June 1989.
[33] M. D. Levine and A. M. Nazif, An experimental rule-based
system for testing low level segmentation strategies, in Multicomputers and Image Processing Algorithms and Programs,
K. Preston and L. Uhr, Eds., pp. 149160, Academic Press,
New York, NY, USA, 1982.
[34] Y. W. Lim and S. U. Lee, On the color image segmentation
algorithm based on the thresholding and the fuzzy C-means
techniques, Pattern Recognition, vol. 23, no. 9, pp. 935952,
1990.
[35] L. Cinque, R. Cucchiara, S. Levialdi, and G. Pignalberi, A
methodology to award a score to range image segmentation,
in Proc. 6th International Conference on Pattern Recognition
and Information Processing, pp. 171175, Minsk, Belarus, May
2001.
[36] F. Herrera, M. Lozano, and J. L. Verdegay, Tackling realcoded genetic algorithms: operators and tools for behavioural
790
analysis, Artificial Intelligence Review, vol. 12, no. 4, pp. 265
319, 1998.
[37] R. Homan and A. K. Jain, Segmentation and classification
of range images, IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 9, no. 5, pp. 608620, 1987.
[38] Range image segmentation comparison project, 2002,
http://marathon.csee.usf.edu/range/seg-comp/results.html.
[39] L. Cinque, R. Cucchiara, S. Levialdi, and G. Pignalberi, A decision support system for range image segmentation, in Proc.
3rd International Conference on Digital Information Processing
and Control in Extreme Situations, pp. 4550, Minsk, Belarus,
May 2002.
Gianluca Pignalberi received in 2000 his
degree in computer science, focusing especially on image processing and artificial
intelligence methods, from the University
of Rome La Sapienza. He is a Consultant, and his current interests include language recognition and data compression
techniques, combined with artificial intelligence methods.
Rita Cucchiara graduated magna cum
laude in 1989, with the Laurea in electronic
engineering from University of Bologna and
received the Ph.D. in computer engineering
from University of Bologna in 1993. She was
an Assistant Professor at the University of
Ferrara and is currently an Associate Professor in computer engineering at the Faculty of Engineering of Modena, University
of Modena and Reggio Emilia, Italy, since
1998. Her research activity includes computer vision and pattern
recognition, and in particular image segmentation, genetic algorithms for optimization, motion analysis, and color analysis. She
is currently involved in research projects of video surveillance, domotics, video transcoding for high performance video servers, and
support to medical diagnosis with image analysis. Rita Cucchiara is
a member of the IEEE, ACM, GIRPR (Italian IAPR), and AIxIA.
Luigi Cinque received his Ph.D. degree in
physics from the University of Napoli in
1983. From 1984 to 1990, he was with the
Laboratory of Artificial Intelligence (Alenia
SpA). Presently, he is a Professor at the Department of Computer Science of the University of Rome La Sapienza. His scientific interests cover image sequences analysis, shape and object recognition, image
database, and advanced man-machine interaction. Professor Cinque is presently an Associate Editor of Pattern Recognition Journal and Pattern Recognition Letters. He is a
senior member of IEEE, ACM, and IAPR. He has been in the program committee of many international conferences in the field of
imaging technology, and he is the author of over 100 scientific publications in international journals and conference proceedings.

Stefano Levialdi graduated as a telecommunications engineer from the University
of Buenos Aires in 1959. He has been
at the University of Rome La Sapienza
since 1983, teaching two courses on humancomputer interaction. His research interests
are in visual languages, human-computer
interaction, and usability. He acts as Director of the Pictorial Computing Laboratory
and is a IEEE Life Fellow in 1991 and has
been the General Chair of over 35 international conferences; he will
be the General Chairman of IFIPS Interact 05 Conference to be
held in Rome, Italy.

Parameter Estimation of a Plucked String Synthesis

Model Using a Genetic Algorithm with Perceptual
Fitness Calculation
Janne Riionheimo
Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,
FIN-02015 HUT, Espoo, Finland
Email: janne.riionheimo@hut.fi
Vesa Valim
aki
Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,
FIN-02015 HUT, Espoo, Finland
Pori School of Technology and Economics, Tampere University of Technology, P.O. Box 300,
FIN-28101, Pori, Finland
Email: vesa.valimaki@hut.fi
We describe a technique for estimating control parameters for a plucked string synthesis model using a genetic algorithm. The
model has been intensively used for sound synthesis of various string instruments but the fine tuning of the parameters has been
carried out with a semiautomatic method that requires some hand adjustment with human listening. An automated method for
extracting the parameters from recorded tones is described in this paper. The calculation of the fitness function utilizes knowledge
of the properties of human hearing.
Keywords and phrases: sound synthesis, physical modeling synthesis, plucked string synthesis, parameter estimation, genetic
algorithm.
1. INTRODUCTION
Model-based sound synthesis is a powerful tool for creating
natural sounding tones by simulating the sound production
mechanisms and physical behavior of real musical instruments. These mechanisms are often too complex to simulate
in every detail, so simplified models are used for synthesis.
The aim is to generate a perceptually indistinguishable model
for real instruments.
One workable method for physical modelling synthesis is
based on digital waveguide theory proposed by Smith [1]. In
the case of the plucked string instruments, the method can
be extended to model also the plucking style and instrument
body [2, 3]. A synthesis model of this kind can be applied to
synthesize various plucked string instruments by changing
the control parameters and using dierent body and plucking models [4, 5]. A characteristic feature in string instrument tones is the double decay and beating eect [6], which
can be implemented by using two slightly mistuned string
models in parallel to simulate the two polarizations of the
transversal vibratory motion of a real string [7].
Parameter estimation is an important and dicult challenge in sound synthesis. Usually, the natural parameter settings are in great demand at the initial state of the synthesis.
When using these parameters with a model, we are able to
produce real-sounding instrument tones. Various methods
for adjusting the parameters to produce the desired sounds
have been proposed in the literature [4, 8, 9, 10, 11, 12].
An automated parameter calibration method for a plucked
string synthesis model has been proposed in [4, 8], and then
improved in [9]. It gives the estimates for the fundamental
frequency, the decay parameters, and the excitation signal
which is used in commuted synthesis.
Our interest in this paper is the parameter estimation of
the model proposed by Karjalainen et al. [7]. The parameters
of the model have earlier been calibrated automatically, but
the fine-tuning has required some hand adjustment. In this
work, we use recorded tones as a target sound with which the
synthesized tones are compared. All synthesized sounds are
then ranked according to their similarity with the recorded
tone. An accurate way to measure sound quality from the
792
viewpoint of auditory perception would be to carry out listening tests with trained participants and rank the candidate
solutions according to the data obtained from the tests [13].
This method is extremely time consuming and, therefore, we
are forced to use analytical methods to calculate the quality of
the solutions. Various techniques to simulate human hearing
and calculate perceptual quality exist. Perceptual linear predictive (PLP) technique is widely used with speech signals
[14], and frequency-warped digital signal processing is used
to implement perceptually relevant audio applications [15].
In this work, we use an error function that simulates
the human hearing and calculates the perceptual error between the tones. Frequency masking behavior, frequency dependence, and other limitations of human hearing are taken
into account. From the optimization point of view, the task
is to find the global minimum of the error function. The
variables of the function, that is, the parameters of the synthesis model, span the parameter space where each point
corresponds to a set of parameters and thus to a synthesized sound. When dealing with discrete parameter values,
the number of parameter sets is finite and given by the product of the number of possible values of each parameter. Using nine control parameters with 100 possible values, a total
of 1018 combinations exist in the space and, therefore, an exhaustive search is obviously impossible.
Evolutionary algorithms have shown a good performance
in optimizing problems relating to the parameter estimation
of synthesis models. Vuori and Valimaki [16] tried a simulated evolution algorithm for the flute model, and Horner et
al. [17] proposed an automated system for parameter estimation of FM synthesizer using a genetic algorithm (GA). GAs
have been used for automatically designing sound synthesis
algorithms in [18, 19]. In this study, a GA is used to optimize
the perceptual error function.
This paper is sectioned as follows. The plucked string
synthesis model and the control parameters to be estimated
are described in Section 2. Parameter estimation problem
and methods for solving it are discussed in Section 3.
Section 4 concentrates on the calculation of the perceptual
error. In Section 5, we discretize the parameter space in a
perceptually reasonable manner. Implementation of the GA
and dierent schemes for selection, mutation, and crossover
used in our work are surveyed in Section 6. Experiments and
results are analyzed in Section 7 and conclusions are finally
drawn in Section 8.
2.
PLUCKED STRING SYNTHESIS MODEL
The model proposed by Karjalainen et al. [7] is used for

plucked string synthesis in this study. The block diagram
of the model is presented in Figure 1. It is based on digital
waveguide synthesis theory [1] that is extended in accordance
with commuted waveguide synthesis approach [2, 3] to include also the body modes of the instrument in the string
synthesis model.
Dierent plucking styles and body responses are stored as
wavetables in the memory and used to excite the two string
Horizontal polarization
Excitation
database
Sh (z)
mo
mp
out
gc
1 mp
1 mo
Sv (z)
Vertical polarization
Figure 1: The plucked string synthesis model.
x(n)
y(n)
F(z)
zLI
H(z)
Figure 2: The basic string model.
models Sh (z) and Sv (z) that simulate the eect of the two
polarizations of the transversal vibratory motion. A single
string model S(z) in Figure 2 consists of a lowpass filter H(z)
that controls the decay rate of the harmonics, a delay line
zLI , and a fractional delay filter F(z). The delay time around
the loop for a given fundamental frequency f0 is
Ld =
fs
,
f0
(1)
where fs is the sampling rate (in Hz). The loop delay Ld is

implemented by the delay line zLI and the fractional delay filter F(z). The delay line is used to control the integer
part LI of the string length while the coecients of the filter
F(z) are adjusted to produce the fractional part L f [20]. The
fractional delay filter F(z) is implemented as a first-order allpass filter. Two string models are typically slightly mistuned
to produce a natural sounding beating eect.
A one-pole filter with transfer function
H(z) = g
1+a
1 + az1
(2)
is used as a loop filter in the model. Parameter 0 < g < 1 in

(2) determines the overall decay rate of the sound while parameter 1 < a < 0 controls the frequency-dependent decay.
The excitation signal is scaled by the mixing coecients m p
and (1 m p ) before sending it to two string models. Coecient gc enables coupling between the two polarizations.
Mixing coecient mo defines the proportion of the two polarizations in the output sound. All parameters m p , gc , and
mo are chosen to have values between 0 and 1. The transfer
function of the entire model is written as
M(z) = m p mo Sh (z) + 1 m p 1 mo Sv (z)

+ m p 1 mo gc Sh (z)Sv (z),
(3)
Parameter Estimation Using a Genetic Algorithm
793
Table 1: Control parameters of the synthesis model.

Parameter
f0,h
f0,v
gh
ah
gv
av
mp
mo
gc
Control
Fundamental frequency of the horizontal string model
Fundamental frequency of the vertical string model
Loop gain of the horizontal string model
Frequency-dependent gain of the horizontal string model
Loop gain of the vertical string model
Frequency-dependent gain of the vertical string model
Input mixing coecient
Output mixing coecient
Coupling gain of the two polarizations
where the string models Sh (z) and Sv (z) for the two polarizations can be written as an individual string model
S(z) =
1
1
zLI F(z)H(z)
(4)
Synthesis model of this kind has been intensively used for

sound synthesis of various plucked string instruments [5, 21,
22]. Dierent methods for estimating the parameters have
been used, but in consequence of interaction between the
parameters, systematic methods are at least troublesome but
probably impossible. The nine parameters that are used to
control the synthesis model are listed in Table 1.
3.
ESTIMATION OF THE MODEL PARAMETERS
Determination of the proper parameter values for sound synthesis systems is an important problem and also depends on
the purpose of the synthesis. When the goal is to imitate the
sounds of real instruments, the aim of the estimation is unambiguous: we wish to find a parameter set which gives the
sound output that is suciently similar to the natural one in
terms of human perception. These parameters are also feasible for virtual instruments at the initial stage after which the
limits of real instruments can be exceeded by adjusting the
parameters in more creative ways.
Parameters of a synthesis model correspond normally
to the physical characteristics of an instrument [7]. The
estimation procedure can then be seen as sound analysis
where the parameters are extracted from the sound or from
the measurements of physical behavior of an instrument
[23]. Usually, the model parameters have to be fine-tuned
by laborious trial and error experiments, in collaboration
with accomplished players [23]. Parameters for the synthesis model in Figure 1 have earlier been estimated this way
and recently in a semiautomatic fashion, where some parameter values can be obtained with an estimation algorithm while others must be guessed. Another approach is
to consider the parameter estimation problem as a nonlinear optimization process and take advantage of the general searching methods. All possible parameter sets can then
be ranked according to their similarity with the desired
sound.
3.1.
Calibrator
A brief overview of the calibration scheme, used earlier with

the model, is given here. The fundamental frequency f0 is
first estimated using the autocorrelation method. The frequency estimate in samples from (1) is used to adjust the delay line length LI and the coecients of the fractional delay
filter F(z). The amplitude, frequency, and phase trajectories
for partials are analyzed using the short-time Fourier transform (STFT), as in [4]. The estimates for loop filter parameters g and a are then analyzed from the envelopes of individual partials. The excitation signal for the model is extracted
from the recorded tone by a method described in [24]. The
amplitude, frequency, and phase trajectories are first used to
synthesize the deterministic part of the original signal and
the residual is obtained by a time-domain subtraction. This
produces a signal which lacks the energy to excite the harmonics when used with the synthesis model. This is avoided
by inverse filtering the deterministic signal and the residual
separately. The output signal of the model is finally fed to
the optimization routine which automatically fine-tunes the
model parameters by analyzing the time-domain envelope of
the signal.
The dierence in the length of the delay lines can be estimated based on the beating of a recorded tone. In [25],
the beating frequency is extracted from the first harmonic
of a recorded string instrument tone by fitting a sine wave
using the least squares method. Another procedure for extracting beating and two-stage decay from the string tones is
described by Bank in [26]. In practice, the automatical calibrator algorithm is first used to find decent values for the
control parameters of one string model. These values are also
used for another string model. The mistuning between the
two string models has then been found by ear [5] and the
dierences in the decay parameters are set by trial and error.
Our method automatically extracts the nine control parameter values from recorded tones.
3.2.
Optimization
Instead of extracting the parameters from audio measurements, our approach here is to find the parameter set that
produces a tone that is perceptually indistinguishable from
the target one. Each parameter set can be assigned with a
794
quality value which denotes how good is the candidate solution. This performance metric is usually called a fitness
function, or inversely, an error function. A parameter set is
fed into the fitness function which calculates the error between the corresponding synthesized tone and the desired
sound. The smaller the error, the better the parameter set and
the higher the fitness value. These functions give a numerical grade to each solution, by means of which we are able to
classify all possible parameter sets.
4.
FITNESS CALCULATION
Human hearing analyzes sound both in the frequency and

time domain. Since spectra of all musical sounds vary with
time, it is appropriate to calculate the spectral similarity
in short time segments. A common method is to measure
the least squared error of the short-time spectra of the two
sounds [17, 18]. The STFT of signal y(n) is a sequence of
discrete Fourier transforms (DFT)
Y (m, k) =
N
1
w(n)y(n + mH)e jwk n ,
m = 0, 1, 2, . . . ,
n=0
(5)
with
wk =
2k
,
N
k = 0, 1, 2, . . . , N 1,
(6)
where N is the length of the DFT, w(n) is a window function,

and H is the hop size or time advance (in samples) per frame.
Integers m and k refer to the frame index and frequency bin,
respectively. When N is a power of two, for example, 1024,
each DFT can be computed eciently with the FFT algorithm. If o(n) is the output sound of the synthesis model and
t(n) is the target sound, then the error (inverse of the fitness)
of the candidate solution is calculated as follows:
L1 N 1
E=

1
1
O(m, k) T(m, k) 2 ,
=
F
L m=0 k=0
(7)
where O(m, k) and T(m, k) are the STFT sequences of o(n)

and t(n) and L is the length of the sequences.
4.1. Perceptual quality
The analytical error calculated from (7) is a raw simplification from the viewpoint of auditory perception. Therefore,
an auditory model is required. One possibility would be to
include the frequency masking properties of human hearing
by applying a narrow band masking curve [27] for each partial. This method has been used to speed up additive synthesis [28] and perceptual wavetable matching for synthesis
of musical instrument tones [29]. One disadvantage of the
method is that it requires peak tracking of partials, which
is a time-consuming procedure. We use here a technique
which determines the threshold of masking from the STFT
sequences. The frequency components below that threshold
are inaudible, therefore, they are unnecessary when calculating the perceptual similarity. This technique proposed in [30]
has been successfully applied in audio coding and perceptual

error calculation [18].
4.2.
Calculating the threshold of masking
The threshold of masking is calculated in several steps:

(1) windowing the signal and calculating STFT,
(2) calculating the power spectrum for each DFT,
(3) mapping the frequency scale into the Bark domain and
calculating the energy per critical band,
(4) applying the spreading function to the critical band
energy spectrum,
(5) calculating the spread masking threshold,
(6) calculating the tonality-dependent masking threshold,
(7) normalizing the raw masking threshold and calculating the absolute threshold of masking.
The frequency power spectrum is translated into the Bark
scale by using the approximation [27]
= 13 arctan
0.76 f
kHz
&
+ 3.5 arctan
f
7.5 kHz
&2
(8)
where f is the frequency in Hertz and is the mapped frequency in Bark units. The energy in each critical band is calculated by summing the frequency components in the critical
band. The number of critical bands depends on the sampling
rate and is 25 for the sample rate of 44.1 kHz. The discrete
representation of fixed critical bands is a close approximation and, in reality, each band builds up around a narrow
band excitation. A power spectrum P(k) and energy per critical band Z() for a 12 milliseconds excerpt from a guitar
tone are shown in Figure 3a.
The eect of masking of each narrow band excitation
spreads across all critical bands. This is described by a spreading function given in [31]
10 log10 B() = 15.91 + 7.5( + 0.474)
+
17.5 1 + ( + 0.474)2 dB.
(9)
The spreading function is presented in Figure 3b. The

spreading eect is applied by convolving the critical band energy function Z() with the spreading function B() [30].
The spread energy per critical band SP () is shown in
Figure 3c.
The masking threshold depends on the characteristics of
the masker and masked tone. Two dierent thresholds are
detailed and used in [30]. For the tone masking noise, the
threshold is estimated as 14.5 + dB below the SP . For noise
masking, the tone it is estimated as 5.5 dB below the SP . A
spectral flatness measure is used to determine the noiselike
or tonelike characteristics of the masker. The spectral flatness
measure V is defined in [30] as the ratio of the geometric
to the arithmetic mean of the power spectrum. The tonality
factor is defined as follows:
%
= min
&
V
,1 ,
Vmax
(10)
795
20
Magnitude (dB)
Magnitude (dB)
20
40
60
20
60
80
80
100
40
63
P(k)
250
1k
Frequency (Hz)
4k
100
6
16k
0
Bark
Z()
(a) Power spectrum (solid line) and energy per critical band
(dashed line).
(b) Spreading function.
20
20
Magnitude (dB)
Magnitude (dB)
40
60
80
100
20
40
60
80
63
P(k)
250
1k
Frequency (Hz)
4k
16k
S()
100
20
63
250
1k
Frequency (Hz)
P(k)
(c) Power spectrum (solid line) and spread energy per critical
band (dashed line).
4k
16k
W()
(d) Power spectrum (solid line) and final masking threshold

(dashed line).
Figure 3: Determining the threshold of masking for a 12 milliseconds excerpt from a recorded guitar tone. Fundamental frequency of the
tone is 331 Hz.
where Vmax = 60 dB. That is to say that if the masker signal is entirely tonelike, then = 1, and if the signal is pure
noise, then = 0. The tonality factor is used to geometrically weight the two thresholds mentioned above to form the
masking energy oset U() for a critical band
U() = (14.5 + ) + 5.5(1 ).
(11)
The oset is then subtracted from the spread spectrum to

estimate the raw masking threshold
R() = 10log10 (SP ())U()/10 .
(12)
Convolution of the spreading function and the critical band
energy function increases the energy level in each band. The

normalization procedure used in [30] takes this into account
and divides each component of R() by the number of points
in the corresponding band
Q() =
R()
,
Np
(13)
where N p is the number of points in the particular critical band. The final threshold of masking for a frequency
spectrum W(k) is calculated by comparing the normalized
threshold to the absolute threshold of hearing and mapping from Bark to the frequency scale. The most sensitive
area in human hearing is around 4 kHz. If the normalized
796
Amplitude (dB)
energy Q() in any critical band is lower than the energy in

a 4 kHz sinusoidal tone with one bit of dynamic range, it is
changed to the absolute threshold of hearing. This is a simplified method to set the absolute levels since in reality the
absolute threshold of hearing varies with the frequency.
An example of the final threshold of masking is shown
in Figure 3d. It is seen that many of the high partials and
the background noise at the high frequencies are below the
threshold and thus inaudible.
20
40
4.3. Calculating the perceptual error

Perceptual error is calculated in [18] by weighting the error
from (7) with two matrices
G(m, k) =
0
if T(m, k) W(m, k),
if O(m, k) W(m, k), T(m, k) < W(m, k),
otherwise,
(14)
where m and k refer to the frame index and frequency bin,

as defined previously. Matrices are defined such that the full
error is calculated for spectral components which are audible
in a recorded tone t(n) (that is above the threshold of masking). The matrix G(m, k) is used to account for these components. For the components which are inaudible in a recorded
tone but audible in the sound output of the model o(n), the
error between the sound output and the threshold of masking is calculated. The matrix H(m, k) is used to weight these
components.
Perceptual error E p is a sum of these two cases. No error
is calculated for the components which are below the threshold of masking in both sounds. Finally, the perceptual error
function is evaluated as
Ep =
1
Fp
N 1
L1

1
O(m, k) T(m, k) 2 G(m, k)
Ws (k)
L k=0
m=0

O(m, k) T(m, k) 2 H(m, k) ,
(15)
where Ws (k) is an inverted equal loudness curve at sound
pressure level of 60 dB shown in Figure 4 that is used to
weight the error and imitate the frequency-dependent sensitivity of human hearing.
5.
20
63
250
1k
Frequency (Hz)
4k
16k
Figure 4: The frequency-dependent weighting function, which is

the inverse of the equal loudness curve at the SPL of 60 dB.
otherwise,
H(m, k)
=
60
DISCRETIZING THE PARAMETER SPACE
The number of data points in the parameter space can be

reduced by discretizing the individual parameters in a perceptually reasonable manner. The range of parameters can be
reduced to cover only all the possible musical tones and deviation steps can be kept just below the discrimination threshold.
5.1.
Decay parameters
The audibility of variations in decay of the single string

model in Figure 2 have been studied in [32]. Time constant
of the overall decay was used to describe the loop gain
parameter g while the frequency-dependent decay was controlled directly by parameter a. Values of and a were varied
and relatively large deviations in parameters were claimed to
be inaudible. Jarvelainen and Tolonen [32] proposed that a
variation of the time constant between 75% and 140% of the
reference value can be allowed in most cases. An inaudible
variation for the parameter a was between 83% and 116% of
the reference value.
The discrimination thresholds were determined with two
dierent tone durations 0.6 second and 2.0 seconds. In our
study, the judgement of similarity between two tones is done
by comparing the entire signals and, therefore, the results
from [32] cannot be directly used for the parametrization
of a and g. The tolerances are slightly smaller because the
judgement is made based on not only the decay but also the
duration of a tone. Based on our informal listening test and
including a margin of certainty, we have defined the variation
to be 10% for the and 7% for the parameter a. The parameters are bounded so that all the playable musical sounds from
tightly damped picks to very slowly decaying notes are possible to produce with the model. This results in 62 discrete
nonuniformly distributed values for g and 75 values for a, as
shown in Figures 5a and 5b. The corresponding amplitude
envelopes of tones with dierent g parameter are shown in
Figure 5c. Loop filter magnitude responses for varying parameter a with g = 1 are shown in Figure 5d.
5.2.
Fundamental frequency and beating parameters

The fundamental frequency estimate f0 from the calibrator
is used as an initial value for both polarizations. When the
797
0
0.1
0.2
Value of parameter a
Value of parameter g
0.95
0.9
0.85
0.3
0.4
0.5
0.8
0.6
0.75
0
20
40
60
20
Discrete scale
(a) Discrete values for the parameter g when f0 = 331 and the
variation for the time constant is 10%.
10
20
Amplitude (dB)
Amplitude (dB)
60
(b) Discrete values for the parameter a when the variation is 7%.
30
40
12
50
60
40
Discrete scale
10
15
5000
10000
15000
Frequency (Hz)
Time (s)
(c) Amplitude envelopes of tones with dierent discrete values of g.
20000
(d) Loop filter magnitude responses for dierent discrete values

of a when g = 1.
Figure 5: Discretizing the parameters g and a.
fundamental frequencies of two polarizations dier, the frequency estimate settles in the middle of the frequencies, as
shown in Figure 6. Frequency discrimination thresholds as
a function of frequency have been proposed in [33]. Also
the audibility of beating and amplitude modulation has been
studied in [27]. These results do not give us directly the discrimination thresholds for the dierence in the fundamental
frequencies of the two-polarization string model, because the
fluctuation strength in an output sound depends on the fundamental frequencies and the decay parameters g and a.
The sensitivity of parameters can be examined when a
synthesized tone with known parameter values is used as a
target tone with which another synthesized tone is compared.
Varying one parameter after another and freezing the others, we obtain the error as a function of the parameters. In
Figure 7, the target values of f0,v and f0,h are 331 and 330 Hz.
The solid line shows the error when f0,v is linearly swept from
327 to 344 Hz. The global minimum is obviously found when
f0,v = 331 Hz. Interestingly, another nonzero local minimum

is found when f0,v = 329 Hz, that is, when the beating is similar. The dashed line shows the error when both f0,v and f0,h
are varied but the dierence in the fundamental frequencies
is kept constant. It can be seen that the dierence is more
dominant than the absolute frequency value and have to be
therefore discretized with higher resolution. Instead of operating the fundamental frequency parameters directly, we
optimize the dierence d f = | f0,v f0,h | and the mean frequency f0 = | f0,v + f0,h |/2 individually. Combining previous
results from [27, 33] with our informal listening test, we have
discretized d f with 100 discrete values and f0 with 20. The
range of variation is set as follows:
% &1/3
f0
rp =
which is shown in Figure 8.
10
(16)
798

250
150
Error
Normalized magnitude
200
0.5
100
0.5
50
0
328
0
0.01
0.02
Time (s)
80 Hz
84 Hz
0.03
0.04
f0,v
( f0,v + f0,h )/2
80 + 84 Hz
Maximum
0.5
f0,h
10
9
0
r p+ r p (Hz)
Normalized magnitude
333
Figure 7: Error as a function of the fundamental frequencies. The

target values of f0,v and f0,h are 331 and 330 Hz. The solid line shows
the error when f0,h = 330 and f0,v is linearly swept from 327 to
334 Hz. The dashed line shows the error when both frequencies are
varied simultaneously while the dierence remains similar.
(a) Entire autocorrelation function.
0.5
329
330
331
332
Fundamental frequency f0 (Hz)
0.01
0.011
80 Hz
84 Hz
0.012
Time (s)
0.013
8
7
6
0.014
5
80 + 84 Hz
Maximum
(b) Zoomed around the maximum.
Figure 6: Three autocorrelation functions. Dashed and solid lines

show functions for two single-polarization guitar tones with fundamental frequencies of 80 and 84 Hz. Dash-dotted line corresponds
to a dual-polarization guitar tone with fundamental frequencies of
80 and 84 Hz.
5.3. Other parameters

The tolerances for the mixing coecients m p , mo , and gc have
not been studied and the parameters have been earlier adjusted by trial and error [5]. Therefore, no initial guesses are
made for these parameters. The sensitivities of the mixing coecients are examined in an example case in Figure 9, where
m p = 0.5, m p = 0.5, and m p = 0.1. It can be seen that the
parameters m p and mo are most sensitive near the boundaries and the parameter gc is most sensitive near zero. Ranges
for m p and mo are discretized with 40 values according to
125
250
500
Frequency estimate f0 (Hz)
1k
Figure 8: The range of variation in fundamental frequency as a

function of frequency estimate from 80 to 1000 Hz.
Figure 10. This method is applied to the parameter gc , the

range of which is limited to 00.5.
Discretizing the nine parameters this way results in 2.77
1015 combinations in total for a single tone. For an acoustic guitar, about 120 tones with dierent dynamic levels and
playing styles have to be analyzed. It is obvious that an exhaustive search is out of question.
6.
GENETIC ALGORITHM
GAs mimic the evolution of nature and take advantage of

the principle of survival of the fittest [34]. These algorithms
operate on a population of potential solutions improving
799
300
crete parameter value. The original floating-point operators

are discussed in [36], where the characteristics of the operators are also described. Few modifications to the original
mutation operators in step 5 have been made to improve the
operation of the algorithm with the discrete grid.
The algorithm we use is implemented as follows.
250
Error
200
150
100
50
0
0.2
0.4
0.6
0.8
Gain
mp
mo
gc
Target values
Figure 9: Error as a function of mixing coecients m p , mo , and

coupling coecient gc . Target values are m p = mo = 0.5 and gc =
0.1.
Value of parameters m p and mo
0.8
0.6
(1) Analyze the recorded tone to be resynthesized using

the analysis methods discussed in Section 3. The range
of the parameter f0 is chosen and the excitation signal is produced according to these results. Calculate
the threshold of masking (Section 4) and the discrete
scales for the parameters (Section 5).
(2) Initialization: create a population of S p individuals
(chromosomes). Each chromosome is represented as
a vector array
x, with nine components (genes), which
contains the actual parameters. The initial parameter
values are randomly assigned.
(3) Fitness calculation: calculate the perceptual fitness of
each individual in the current population according to
(15).
(4) Selection of individuals: select individuals from the
current population to produce the next generation
based upon the individuals fitness. We use the normalized geometric selection scheme [37], where the
individuals are first ranked according to their fitness
values. The probability of selecting the ith individual
to the next generation is then calculated by
Pi = q (1 q)r 1 ,
(17)
q
,
1 (1 q)S p
(18)
0.4
where
0.2
q =
0
0
10
20
Discrete scale
30
40
Figure 10: Discrete values for the parameters m p and mo .
characteristics of the individuals from generation to generation. Each individual, called a chromosome, is made up of
an array of genes that contain, in our case, the actual parameters to be estimated.
In the original algorithm design, the chromosomes were
represented with binary numbers [35]. Michalewicz [36]
showed that representing the chromosomes with floatingpoint numbers results in faster, more consistent, higher precision, and more intuitive solution of the algorithm. We
use a GA with the floating-point representation, although
the parameter space is discrete, as discussed in Section 5.
We have also experimented with the binary-number representation, but the execution time of the iteration becomes
slow. Nonuniformly graduated parameter space is transformed into the uniform scales where the GA operates on.
The floating-point numbers are rounded to the nearest dis-
q is the user-defined parameter which denotes the

probability of selecting the best individual, and r is the
rank of the individual, where 1 is the best and S p is the
worst. Decreasing the value of q slows the convergence.
(5) Crossover: randomly pick a specified number of parents from selected individuals. An ospring is produced by crossing the parents with a simple, arithmetical, and heuristic crossover scheme. Simple crossover
creates two new individuals by splitting the parents in
a random point and swapping the parts. Arithmetical crossover produces two linear combinations of the
parents with a random weighting. Heuristic crossover
produces a single ospring
xo which is a linear extrapolation of the two parents
x p,1 and
x p,2 as follows:

xo = h
x p,2
x p,1 +
x p,2 ,
(19)
where 0 h 1 is a random number and the parent

x p,2 is not worse than
x p,1 . Nonfeasible solutions are
possible and if no solution is found after w attempts,
the operator gives no ospring. Heuristic crossover
contributes to the precision of the final solution.
800
(6) Mutation: randomly pick a specified number of individuals for mutation. Uniform, nonuniform, multinonuniform, and boundary mutation schemes are
used. Mutation works with a single individual at a
time. Uniform mutation sets a randomly selected parameter (gene) to a uniform random number between
the boundaries. Nonuniform mutation operates uniformly at early stage and more locally as the current
generation approaches the maximum generation. We
have defined the scheme to operate in such a way that
the change is always at least one discrete step. The degree of nonuniformity is controlled with the parameter b. Nonuniformity is important for fine-tuning.
Multi-nonuniform mutation changes all of the parameters in the current individual. Boundary mutation sets a parameter to one of its boundaries and is
useful if the optimal solution is supposed to lie near
the boundaries of the parameter space. The boundary mutation is used in special cases, such as staccato
tones.
(7) Replace the current population with the new one.
(8) Repeat steps 3, 4, 5, 6, and 7 until termination.
Our algorithm is terminated when a specified number of
generations is produced. The number of generations defines
the maximum duration of the algorithm. In our case, the
time spent with the GA operations is negligible compared to
the synthesis and fitness calculation. Synthesis of a tone with
candidate parameter values takes approximately 0.5 second,
while the duration of the error calculation is 1.2 second. This
makes 1.7 second in total for a single parameter set.
7.
EXPERIMENTATION AND RESULTS
To study the eciency of the proposed method, we first tried

to estimate the parameters for the sound produced by the
synthesis model itself. First, the same excitation signal extracted from a recorded tone by the method described in
[24] was used for target and output sounds. A more realistic case is simulated when the excitation for resynthesis is extracted from the target sound. The system was implemented
with Matlab software and all runs were performed on an Intel Pentium III computer. We used the following parameters
for all experiments: population size S p = 60, number of generations = 400, probability of selecting the best individual
q = 0.08, degree of nonuniformity b = 3, retries w = 3,
number of crossovers = 18, and number of mutations = 18.
The pitch synchronous Fourier transform scheme, where
the window length Lw is synchronized with the period length
of the signal such that Lw = 4 fs / f0 , is utilized in this work.
The overlap of the used hanning windows is 50%, implying
that hop size H = Lw /2. The sampling rate is fs = 44100 Hz
and the length of FFT is N = 2048.
The original and the estimated parameters for three experiments are shown in Table 2. In experiment 1 the original excitation is used for the resynthesis. The exact parameters are estimated for the dierence d f and for the decay
parameters gh , gv , and av . The adjacent point in the discrete grid is estimated for the decay parameter ah . As can
be seen in Figure 7, the sensitivity of the mean frequency
is negligible compared to the dierence d f , which might be
the cause of deviations in mean frequency. Dierences in the
mixing parameters mo , m p , and the coupling coecient gc
can be noticed. When running the algorithm multiple times,
no explicit optima for mixing and coupling parameters were
found. However, synthesized tones produced by corresponding parameter values are indistinguishable. That is to say that
the parameters m p , mo , and gc are not orthogonal, which is
clearly a problem with the model and also impairs the eciency of our parameter estimation algorithm.
To overcome the nonorthogonality problem, we have run
the algorithm with constant values of m p = mo = 0.5 in experiment 2. If the target parameters are set according to discrete grid, the exact parameters with zero error are estimated.
The convergence of the parameters and the error of such case
is shown in Figure 11. Apart from the fact that the parameter
values are estimated precisely, the convergence of the algorithm is very fast. Zero error is already found in generation
87.
A similar behavior is noticed in experiment 3 where an
extracted excitation is used for resynthesis. The dierence
and the decay parameters gh and gv are again estimated precisely. Parameters m p , mo , and gc drift as in previous experiment. Interestingly, m p = 1, which means that the straight
path to vertical polarization is totally closed. The model is, in
a manner of speaking, rearranged in such a way that the individual string models are in series as opposed to the original
construction where the polarization are arranged in parallel.
Unlike in experiments 1 and 2, the exact parameter values are not so relevant since dierent excitation signals are
used for the target and estimated tones. Rather than looking into the parameter values, it is better to analyze the tones
produced with the parameters. In Figure 12, the overall temporal envelopes and the envelopes of the first eight partials
for the target and for the estimated tone are presented. As
can be seen, the overall temporal envelopes are almost identical and the partial envelopes match well. Only the beating
amplitude diers slightly but it is inaudible. This indicates
that the parametrization of the model itself is not the best
possible since similar tones can be synthesized with various
parameter sets.
Our estimation method is designed to be used with real
recorded tones. Time and frequency analysis for such case
is shown in Figure 13. As can be seen, the overall temporal envelopes and the partial envelopes for a recorded tone
are very similar to those that are analyzed from a tone that
uses estimated parameter values. Appraisal of the perceptual
quality of synthesized tones is left as a future project, but
our informal listening indicates that the quality is comparable with or better than our previous methods and it does
not require any hand tuning after the estimation procedure.
Sound clips demonstrating these experiments are available at
http://www.acoustics.hut.fi/publications/papers/jasp-ga.
801
3
332
Fundamental frequency (Hz)
Fundamental frequency (Hz)
332.5
331.5
331
330.5
330
329.5
2.5
2
1.5
1
0.5
0
329
50
100
150
50
f0
100
150
Generation
Generation
Target value of f0
df
(a) Convergence of the parameter f0 .
Target value of d f
(b) Convergence of the parameter d f .

0
0.1
0.99
0.98
Value of a
Value of g
0.2
0.97
0.3
0.4
0.5
0.96
0.95
0.6
100
200
300
Generation
Target value of gh
Target value of gv
gh
gv
400
ah
av
0.8
20
0.6
15
Gain
Error
25
0.4
10
0.2
50
100
Generation
gc
150
Target value of ah
Target value of av
(d) Convergence of the parameters ah and av .
100
Generation
(c) Convergence of the parameters gh and gv .
50
150
100
200
Generation
300
400
Target values
(e) Convergence of the parameter gc .
(f) Convergence of the error.
Figure 11: Convergence of the seven parameters and the error for experiment 2 in Table 2. Mixing coecients are frozen as m p = mo = 0.5 to
overcome the nonorthogonality problem. One hundred and fifty generations are shown and the original excitation is used for the resynthesis.
802
Table 2: Original and estimated parameters when a synthesized tone with known parameter values are used as a target tone. The original
excitation is used for resynthesis in experiments 1 and 2 and the extracted excitation is used for the resynthesis in experiment 3. In experiment
2 the mixing coecients are frozen as m p = mo = 0.5.
Parameter
Target parameter
Experiment 1
Experiment 2
Experiment 3
f0
330.5409
331.000850
330.5409
330.00085
df
0.8987
0.8987
0.8987
0.8987
gh
0.9873
0.9873
0.9873
0.9873
ah
gv
0.2905
0.3108
0.2905
0.2071
0.9907
0.9907
0.9907
0.9907
av
mp
0.1936
0.1936
0.1936
0.1290
0.5
0.2603
(0.5)
1.000
mo
gc
0.5
0.1013
0.6971
0.2628
(0.5)
0.1013
0.8715
0.2450
0.0464
0.4131
Error
0.5
Amplitude (dB)
Normalized amplitude
20
40
60
0.5
2
0
4
1
0.5
1
Time (s)
1.5
Partial
6
8 2
(a) Overall temporal envelope for a target tone.
1
Time (s)
(b) First eight partials for a target tone.
0.5
Amplitude (dB)
20
40
60
0.5
2
0
4
1
0.5
1
Time (s)
1.5
(c) Overall temporal envelope for an estimated tone.
Partial
6
8 2
1
Time (s)
(d) First eight partials for an estimated tone.
Figure 12: Time and frequency analysis for experiment 3 in Table 2. The synthesized target tone is produced with known parameter values
and the synthesized tone uses estimated parameter values. Extracted excitation is used for the resynthesis.
803
0.5
Amplitude (dB)
20
40
60
0.5
2
0
4
1
2
Time (s)
Partial
6
8 2
(a) Waveform for a recorded tone.
1
Time (s)
(b) First eight partials for a recorded tone.
0.5
0
Amplitude (dB)
20
40
60
0.5
2
0
4
1
2
Time (s)
(c) Waveform for an estimated tone.
Partial
6
8 2
1
Time (s)
(d) First eight partials for an estimated tone.
Figure 13: Time and frequency analysis for a recorded tone and for a synthesized tone that uses estimated parameter values. Extracted
excitation is used for the resynthesis. Estimated parameter values are f0 = 331.1044, d f = 1.1558, gh = 0.9762, ah = 0.4991, gv = 0.9925,
av = 0.0751, m p = 0.1865, mo = 0.7397, and gc = 0.1250.
8. CONCLUSIONS AND FUTURE WORK

A parameter estimation scheme based on a GA with a perceptual fitness function was designed and tested for a plucked
string synthesis algorithm. The synthesis algorithm is used
for natural-sounding synthesis of various string instruments.
For this purpose, automatic parameter estimation is needed.
Previously, the parameter values have been extracted from
recordings using more traditional signal processing techniques, such as short-term Fourier transform, linear regression, and linear digital filter design. Some of the parameters
could not have been reliably estimated from the recorded
sound signal, but they have had to be fine-tuned manually
by an expert user.
In this work, we presented a fully automatic parameter
extraction method for string synthesis. The fitness function
we use employs knowledge of properties of the human auditory system, such as frequency-dependent sensitivity and
frequency masking. In addition, a discrete parameter space
has been designed for the synthesizer parameters. The range,

the nonuniformity of the sampling grid, and the number of
allowed values for each parameter were chosen based on former research results, experiments on parameter sensitivity,
and informal listening.
The system was tested with both synthetic and real tones.
The signals produced with the synthesis model itself are considered a particularly useful class of test signals because there
will always be a parameter set that exactly reproduces the analyzed signal (although discretization of the parameter space
may limit the accuracy in practice). Synthetic signals oered
an excellent tool to evaluate the parameter estimation procedure, which was found to be accurate with two choices of
excitation signal to the synthesis model. The quality of resynthesis of real recordings is more dicult to measure as there
are no known correct parameter values. As high-quality synthesis of several plucked string instrument sounds has been
possible in the past with the same synthesis algorithm, we
804
expected to hear good results using the GA-based method,
which was also the case.
Appraisal of synthetic tones that use parameter values
from the proposed GA-based method is left as a future
project. Listening tests similar to those used for evaluating
high-quality audio coding algorithms may be useful for this
task.
REFERENCES
[1] J. O. Smith, Physical modeling using digital waveguides,
Computer Music Journal, vol. 16, no. 4, pp. 7491, 1992.
[2] J. O. Smith, Ecient synthesis of stringed musical instruments, in Proc. International Computer Music Conference
(ICMC 93), pp. 6471, Tokyo, Japan, September 1993.
[3] M. Karjalainen, V. Valimaki, and Z. Janosy, Towards highquality sound synthesis of the guitar and string instruments,
in Proc. International Computer Music Conference (ICMC 93),
pp. 5663, Tokyo, Japan, September 1993.
[4] V. Valimaki, J. Huopaniemi, M. Karjalainen, and Z. Janosy,
Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp. 331353, 1996.
[5] M. Laurson, C. Erkut, V. Valimaki, and M. Kuuskankare,
Methods for modeling realistic playing in acoustic guitar
synthesis, Computer Music Journal, vol. 25, no. 3, pp. 3849,
2001.
[6] G. Weinreich, Coupled piano strings, Journal of the Acoustical Society of America, vol. 62, no. 6, pp. 14741484, 1977.
[7] M. Karjalainen, V. Valimaki, and T. Tolonen, Plucked-string
models: from the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol. 22, no. 3, pp.
1732, 1998.
[8] T. Tolonen and V. Valimaki, Automated parameter extraction for plucked string synthesis, in Proc. International Symposium on Musical Acoustics (ISMA 97), pp. 245250, Edinburgh, Scotland, August 1997.
[9] C. Erkut, V. Valimaki, M. Karjalainen, and M. Laurson, Extraction of physical and expressive parameters for modelbased sound synthesis of the classical guitar, in the Audio Engineering Society 108th International Convention, Paris,
France, February 2000, preprint 5114, http://lib.hut.fi/Diss/
2002/isbn9512261901.
[10] A. Nackaerts, B. De Moor, and R. Lauwereins, Parameter
estimation for dual-polarization plucked string models, in
Proc. International Computer Music Conference (ICMC 01),
pp. 203206, Havana, Cuba, September 2001.
[11] S.-F. Liang and A. W. Y. Su, Recurrent neural-network-based
physical model for the chin and other plucked-string instruments, Journal of the Audio Engineering Society, vol. 48, no.
11, pp. 10451059, 2000.
[12] C. Drioli and D. Rocchesso, Learning pseudo-physical models for sound synthesis and transformation, in Proc. IEEE International Conference on Systems, Man, and Cybernetics, pp.
10851090, San Diego, Calif, USA, October 1998.
[13] V.-V. Mattila and N. Zacharov, Generalized listener selection
(GLS) procedure, in the Audio Engineering Society 110th International Convention, Amsterdam, The Netherlands, 2001,
preprint 5405.
[14] H. Hermansky, Perceptual linear predictive (PLP) analysis of
speech, Journal of the Acoustical Society of America, vol. 87,
no. 4, pp. 17381752, 1990.
[15] A. Harma, M. Karjalainen, L. Savioja, V. Valimaki, U. Laine,
and J. Huopaniemi, Frequency-warped signal processing for
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
audio applications, Journal of the Audio Engineering Society,

vol. 48, no. 11, pp. 10111031, 2000.
J. Vuori and V. Valimaki, Parameter estimation of non-linear
physical models by simulated evolutionapplication to the
flute model, in Proc. International Computer Music Conference (ICMC 93), pp. 402404, Tokyo, Japan, September 1993.
A. Horner, J. Beauchamp, and L. Haken, Machine tongues
XVI: Genetic algorithms and their application to FM matching synthesis, Computer Music Journal, vol. 17, no. 4, pp. 17
29, 1993.
R. Garcia, Automatic generation of sound synthesis techniques, M.S. thesis, Massachusetts Institute of Technology,
Cambridge, Mass, USA, 2001.
C. Johnson, Exploring the sound-space of synthesis algorithms using interactive genetic algorithms, in Proc. AISB
Workshop on Artificial Intelligence and Musical Creativity, pp.
2027, Edinburgh, Scotland, April 1999.
D. Jae and J. O. Smith, Extensions of the Karplus-Strong
plucked-string algorithm, Computer Music Journal, vol. 7,
no. 2, pp. 5669, 1983.
C. Erkut, M. Laurson, M. Kuuskankare, and V. Valimaki,
Model-based synthesis of the ud and the renaissance lute,
in Proc. International Computer Music Conference (ICMC 01),
pp. 119122, Havana, Cuba, September 2001.
C. Erkut and V. Valimaki, Model-based sound synthesis of tanbur, a Turkish long-necked lute, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing, pp. 769772, Istanbul, Turkey, June 2000.
C. Roads, The Computer Music Tutorial, MIT Press, Cambridge, Mass, USA, 1996.
V. Valimaki and T. Tolonen, Development and calibration of
a guitar synthesizer, Journal of the Audio Engineering Society,
vol. 46, no. 9, pp. 766778, 1998.
C. Erkut, M. Karjalainen, P. Huang, and V. Valimaki, Acoustical analysis and model-based sound synthesis of the kantele,
Journal of the Acoustical Society of America, vol. 112, no. 4, pp.
16811691, 2002.
B. Bank, Physics-based sound synthesis of the piano, Tech.
Rep. 54, Helsinki University of Technology, Laboratory of
Acoustics and Audio Signal Processing, Espoo, Finland, May
2000, http://www.acoustics.hut.fi/publications/2000.html.
E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models,
Springer-Verlag, Berlin, Germany, 1990.
M. Lagrange and S. Marchand, Real-time additive synthesis
of sound by taking advantage of psychoacoustics, in Proc.
COST-G6 Conference on Digital Audio Eects (DAFx 01), pp.
59, Limerick, Ireland, December 2001.
C. W. Wun and A. Horner, Perceptual wavetable matching
for synthesis of musical instrument tones, Journal of the Audio Engineering Society, vol. 49, no. 4, pp. 250262, 2001.
J. D. Johnston, Transform coding of audio signals using perceptual noise criteria, IEEE Journal on Selected Areas in Communications, vol. 6, no. 2, pp. 314323, 1988.
M. R. Schroeder, B. S. Atal, and J. L. Hall, Optimizing digital
speech coders by exploiting masking properties of the human
ear, Journal of the Acoustical Society of America, vol. 66, no. 6,
pp. 16471652, 1979.
H. Jarvelainen and T. Tolonen, Perceptual tolerances for decay parameters in plucked string synthesis, Journal of the Audio Engineering Society, vol. 49, no. 11, pp. 10491059, 2001.
C. C. Wier, W. Jesteadt, and D. M. Green, Frequency discrimination as a function of frequency and sensation level,
Journal of the Acoustical Society of America, vol. 61, no. 1, pp.
178184, 1977.
M. Mitchell, An Introduction to Genetic Algorithms, MIT
Press, Cambridge, Mass, USA, 1998.

University of Michigan Press, Ann Arbor, Mich, USA, 1975.
[36] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, AI Series. Springer-Verlag, New York, NY,
USA, 1992.
[37] J. Joines and C. Houck, On the use of non-stationary penalty
functions to solve nonlinear constrained optimization problems with GAs, in IEEE International Symposium on Evolutionary Computation, pp. 579584, Orlando, Fla, USA, June
1994.
Janne Riionheimo was born in Toronto,
Canada, in 1974. He studies acoustics and
digital signal processing at Helsinki University of Technology, Espoo, Finland, and music technology, as a secondary subject, at the
Centre for Music and Technology, Sibelius
Academy, Helsinki, Finland. He is currently
finishing his M.S. thesis, which deals with
parameter estimation of a physical synthesis
model. He has worked as a Research Assistant at the HUT Laboratory of Acoustics and Audio Signal Processing from 2001 until 2002. His research interests include physical
modeling of musical instruments and musical acoustics. He is also
working as a Recording Engineer.
Vesa Valimaki was born in Kuorevesi, Finland, in 1968. He received his Master of Science in Technology, Licentiate of Science in
Technology, and Doctor of Science in Technology degrees, all in electrical engineering from Helsinki University of Technology
(HUT), Espoo, Finland, in 1992, 1994, and
1995, respectively. Dr. Valimaki worked at
the HUT Laboratory of Acoustics and Audio Signal Processing from 1990 until 2001.
In 1996, he was a Postdoctoral Research Fellow in the University
of Westminster, London, UK. He was appointed Docent in audio
signal processing at HUT in 1999. During the academic year 2001
2002 he was Professor of Signal Processing at Pori School of Technology and Economics, Tampere University of Technology, Pori,
Finland. In August 2002, he returned to HUT, where he is currently
Professor of Audio Signal Processing. His research interests are in
the application of digital signal processing to audio and music. He
has published more than 120 papers in international journals and
conferences. He holds two patents. Dr. Valimaki is a senior member of the IEEE Signal Processing Society and a member of the
Audio Engineering Society and the International Computer Music
Association.
805

Optimization and Assessment of Wavelet Packet

Decompositions with Evolutionary Computation
Thomas Schell
Department of Scientific Computing, University of Salzburg, Jakob Haringer Street 2, A-5020 Salzburg, Austria
Email: tschell@cosy.sbg.ac.at
Andreas Uhl
Department of Scientific Computing, University of Salzburg, Jakob Haringer Street 2, A-5020 Salzburg, Austria
Email: uhl@cosy.sbg.ac.at
In image compression, the wavelet transformation is a state-of-the-art component. Recently, wavelet packet decomposition has
received quite an interest. A popular approach for wavelet packet decomposition is the near-best-basis algorithm using nonadditive
cost functions. In contrast to additive cost functions, the wavelet packet decomposition of the near-best-basis algorithm is only
suboptimal. We apply methods from the field of evolutionary computation (EC) to test the quality of the near-best-basis results.
We observe a phenomenon: the results of the near-best-basis algorithm are inferior in terms of cost-function optimization but are
superior in terms of rate/distortion performance compared to EC methods.
Keywords and phrases: image compression, wavelet packets, best basis algorithm, genetic algorithms, random search.
1.
INTRODUCTION
The DCT-based schemes for still-image compression (e.g.,

the JPEG standard [1]) have been superceded in favor
of wavelet-based schemes in the last years. Consequently,
the new JPEG2000 standard [2] is based on the wavelet
transformation. Apart from the pyramidal decomposition,
JPEG2000 part II also allows wavelet packet (WP) decomposition which is of particular interest to our studies.
WP-based image compression methods which have been
developed [3, 4, 5, 6] outperform the most advanced wavelet
coders (e.g., SPIHT [7]) significantly for textured images in
terms of rate/distortion performance (r/d).
In the context of image compression, a more advanced
but also more costly technique is to use a framework that
includes both rate and distortion, where the best-basis (BB)
subtree which minimizes the global distortion for a given
coding budget is searched [8, 9]. Other methods use fixed
bases of subbands for similar signals (e.g., fingerprints [10])
or search for good representations with general purpose optimization methods [11, 12].
Usually in wavelet-based image compression, only the
coarse scale approximation subband is successively decomposed. With the WP decomposition also, the detail subbands
lend themselves to further decomposition. From a practical
point of view, each decomposed subband results in four new

subbands: approximation, horizontal detail, vertical detail,
and diagonal detail. Each of these four subbands can be recursively decomposed at will. Consequently, the decomposition can be represented by a quadtree.
Concerning WPs, a key issue is the choice of the decomposition quadtree. Obviously, not every subband must be decomposed further; therefore, a criterion which determines
whether a decomposition step should take place or not is
needed.
Coifman and Wickerhauser [13] introduced additive cost
functions and the BB algorithm which provides an optimal decomposition according to a specific cost metric.
Taswell [14] introduced nonadditive cost functions which are
thought to anticipate the properties of good decomposition quadtrees more accurately. With nonadditive cost functions, the BB algorithm mutates to a near-best-basis (NBB)
algorithm because the decomposition trees are only suboptimal. The divide-and-conquer principle of the BB relies on
the locality (additivity) of the underlying cost function. In
the case of nonadditive cost functions, this locality does not
exist.
In this work, we are interested in the assessment of
the WP decompositions provided by the NBB algorithm.
We focus on the quality of the NBB results in terms of
Optimization and Assessment of Wavelet Packet Decompositions with Evolutionary Computation

cost-function optimization as well as image quality (PSNR).
Both, the cost-function value and the corresponding image
quality of a WP decomposition is suboptimal due to the construction of the NBB algorithm.
We have interfaced the optimization process of WP decompositions by means of cost functions with the concepts
of evolutionary computation (EC). Hereby, we obtain an alternative method to optimize WP decompositions by means
of cost functions. Both approaches, NBB and EC, are subject
to our experiments. The results provide valuable new insights
concerning the intrinsic processes of the NBB algorithm. Our
EC approach perfectly suits the needs for the assessment of
the NBB algorithm but, from a practical point of view, the
EC approach is not competitive in terms of computational
complexity.
In Section 2, we review the definition of the cost functions which we analyze in our experiments. The NBB algorithm is described in Section 3. For the EC methods, we
need a flat representation of quadtrees (Section 4). In Sections 5 and 6, we review genetic algorithms and random
search specifically adapted to WP optimization. For our experiments, we apply an SPIHT inspired software package for
image compression by means of WP decomposition. Our
central tool of analysis are scatter plots of WP decompositions (Section 7). In Section 8, we compare the NBB algorithm and EC for optimizing WP decompositions.
that is, z1 = | yi1 j1 | zMN = | yiM jN |. Hence, the size

of vector z is MN. The cost-function value is calculated as
follows:
4,p
Cn (y) = max k1/ p zk .

k
COST FUNCTIONS
As a preliminary, we review the definitions of a cost function and the additivity. A cost function is a function C :
RM RN R. If y RM RN is a matrix of wavelet
coecients
and C is a cost function, then C(0) = 0 and
C(y) = i, j C(yi j ). A cost function C is additive if and only
if

C a z 1 z2 = C a z 1 + Ca z 2 ,
(1)
where z1 , z2 RM RN are matrices of wavelet coecients.

The goal of any optimization algorithm is to identify a WP
decomposition with a minimal cost-function value.
Alternatively to the NBB algorithm (Section 3), we apply
methods from evolutionary computation (Sections 5 and 6)
to optimize WP decompositions. The fitness of a particular
WP decomposition is estimated with nonadditive cost functions. We employ the three nonadditive cost functions listed
below.
(i) Coifman Wickerhauser entropy. Coifman and Wickerhauser [15] defined the entropy for wavelet coecients as follows:
Cn1 (y) =

i, j:pi j
=0
pi j ln pi j ,
pi j =
2
yi j
y
2
(2)
(ii) Weak l p Norm. For the weak l p norm [16], we need to

reorder and transform the coecients yi j . All coecients yi j
are rearranged in a decreasing absolute-value sorted vector z,
(3)
From the definition of the weak l p norm, we deduce that unfavorable slowly decreasing sequences or, in the worst case,
uniform sequences of vectors z cause high numerical values
of the norm, whereas fast decreasing zs result in low ones.
(iii) Shannon entropy. Below, we will consider the matrix y simply as a collection of real-valued coecients xi ,
1 i MN. The matrix y is rearranged such that the first
row is concatenated with the second row at the right side and
then the new row is concatenated with the third row and so
on. With a simple histogram binning method, we will estimate the probability mass function. The sample data interval
is given by a = mini xi and b = maxi xi . Given the number of
bins J, the bin width w is w = (b a)/J. The frequency f j for
j 1
the jth bin is defined by f j = #{xi | xi a + jw} k=1 fk .
The probabilities p j are calculated from the frequencies f j
simply by p j = f j /MN. From the obtained class probabilities, we can calculate the Shannon entropy [14]
Cn2,J (y) =
J

j =1
2.
807
p j log2 p j .
(4)
Cost functions are an indirect strategy to optimize the

image quality. PSNR can be seen as a nonadditive cost function. With a slightly modified NBB, PSNR as a cost function
provides WP decomposition with an excellent r/d performance, but at the expense of high computational costs [12].
3.
NBB ALGORITHM
With additive cost functions, a dynamic programming approach, that is, the BB algorithm [13], provides the optimal
WP decomposition with respect to the applied cost function.
Basically, the BB algorithm traverses the quadtree in a depthfirst-search manner and starts at the level right above the
leaves of the decomposition quadtree. The sum of the cost
of the children node is compared to the cost of the parent
node. If the sum is less than the cost of the parent node, the
situation remains unchanged. But, if the cost of the parent
node is less than the cost of the children, then the child nodes
are pruned o the tree. From bottom upwards, the tree is reduced whenever the cost of a certain branch can be reduced.
An illustrating example is presented in [15]. It is an essential
property of the BB algorithm that the decomposition tree is
optimal in terms of the cost criteria, but not in terms of the
obtained r/d performance.
When switching from additive to nonadditive cost functions, the locality of the cost function evaluation is lost. The
BB algorithm can still be applied because the correlation
among the subbands is assumed to be minor but obviously
the result is only suboptimal. Hence, instead of BB, this new
variant is called NBB [14].
808
4.

ENCODING OF WP QUADTREES
To interface the WP software and the EC methods, we use

a flat representation of a WP-decomposition quadtree. In
other words, we want an encoding scheme for quadtrees in
the form of a (binary) string. Therefore, we have adopted
the idea of coding a heap in the heap-sort algorithm. We use
strings b of finite length L over a binary alphabet {0, 1}. If the
bit at index k, 1 k L, is set, then the according subband
has to be decomposed. Otherwise, the decomposition stops
in this branch of the tree
decompose,
bk =
0 stop.
(5)
If the bit at index k is set (bk = 1), the indices of the resulting
four subbands are derived by

km
= 4 k + m,
1 m 4.
(6)
In heaps, the levels of the tree are implicit. We denote the

maximal level of the quadtree by lmax N. At this level, all
nodes are leaves of the quadtree. The level l of any node k in
the quadtree can be determined by
0,
l=
l :
k = 0 (root),
l1

4r k <
r =0
4r , k > 0.
(7)
r =0
The range of level l is 0 l lmax .

5.
GENETIC ALGORITHM
Genetic algorithms (GAs) are evolution-based search algorithms especially designed for parameter optimization problems with vast search spaces. GAs were first proposed in the
seventies by Holland [17]. Generally, parameter optimization
problems consist of an objective function to evaluate and estimate the quality of an admissible parameter set, that is, a solution of the problem (not necessarily the optimal, just anyone). For the GA, the parameter set needs to be encoded into
a string over a finite alphabet (usually a binary alphabet). The
encoded parameter set is called a genotype. Usually, the objective function is slightly modified to meet the requirements
of the GA and hence will be called fitness function. The fitness function determines the quality (fitness) for each genotype (encoded solution). The combination of a genotype and
the corresponding fitness forms an individual. At the start of
an evolution process, an initial population, which consists of
a fixed number of individuals, is generated randomly. In a
selection process, individuals of high fitness are selected for
recombination. The selection scheme mimics natures principle of the survival of the fittest. During recombination, two
individuals at the time exchange genetic material, that is,
parts of the genotype string, are exchanged at random. After a new intermediate population has been created, a mutation operator is applied. The mutation operator randomly
changes some of the alleles (values at certain positions/loci of
the genotype) with a small probability in order to ensure that

alleles which might have vanished from the population have
a chance to reenter. After applying mutation, the intermediate population has turned into a new one (next generation)
replacing the former.
For our experiments, we apply a GA which starts with an
initial population of 100 individuals. The initial population is
generated randomly. The chromosomes are decoded into WP
decompositions as described in Section 4. The fitness of the
individuals is determined with a cost function (Section 2).
Then, the standard cycle of selection, crossover, and mutation is repeated 100 times, that is, we evolve 100 generations
of the initial population. The maximum number of generations was selected empirically such that selection schemes
with a low selection pressure suciently converge. As selection methods, we use binary tournament selection (TS) with
partial replacement [18] and linear ranking selection (LKR)
with = 0.9 [19]. We have experimented with two variants
of crossover. Firstly, we applied standard two-point crossover
but obviously this type of crossover does not take into account the tree structure of the chromosomes. Additionally,
we have conducted experiments with a tree-crossover operator (Section 5.1) which is specifically adapted to operations on quadtrees. For both, two-point crossover and tree
crossover, the crossover rate is set to 0.6 and the mutation
rate is set to 0.01 for all experiments.
As a by-product, we obtained the results presented in
Figure 1 for the image Barbara (Figure 5). Instead of a cost
function, we apply the image quality (PSNR) to determine
the fitness of an individual (i.e., WP decomposition). We
present the development of the PSNR during the course of a
GA. We show the GA results in the following parameter combinations: LRK and TS, each with either two-point crossover
or with tree crossover. After every 100th sample (population
size of the GA) of the random search (RS, Section 6), we
indicate the best-so-far WP decomposition. Obviously, for
each evaluation of a WP decomposition, a full compression
and decompression step which causes a tremendous execution time is required. The result of a NBB optimization using
weak l1 norm is displayed as a horizontal line because the
runtime of the NBB algorithm is far below the time which is
required to evolve one generation of the GA. The PSNR of
the NBB algorithm is out of reach for RS and GA. The treecrossover operator does not improve the performance of the
standard GA. The execution of a GA or RS run lasts from 6
to 10 days on an AMD Duron processor with 600 MHz. The
GA using TS with and without tree crossover was not able to
complete the 100 generations within this time limit. Further
examples of WP optimization by means of EC are discussed
in [20].
5.1.
Tree crossover
Standard crossover operators (e.g., one-point or two-point

crossover) have a considerably disruptive eect on the tree
structure of subbands which is encoded into a binary string.
With the encoding discussed above, a one- or two-point
crossover results in two new individuals with tree structures
which are almost unrelated to the tree structures of their

25.5
809
25.4
2
25.3
PSNR
25.2
4
25.1
25
10
12
11
13
14
15
24.9
24.8
24.7
(a) Individual A.
0
10
20
30
40 50 60
Generations
70
80
90
100
1
GA: LRK ( = 0.9)

GA: TS (t = 2), tree crossover
GA: LRK ( = 0.9), tree crossover
NBB: Wl
RS
GA: TS (t = 2)
Figure 1: Comparison of NBB, GA, and RS.

8
10
11
12
13
14
15
Table 1: Chromosomes of two individuals.

A
B
1
1
1
2
0
1
3
1
0
4
0
1
5
0
1
6
1
0
7
0
0
8
0
1
9
0
1
10
0
1
11
0
1
12
1
0
13
1
0
(b) Individual B.
Figure 2: Parent individuals before crossover.
parents. This obviously contradicts the basic idea of a GA,

that is, the GA is expected to evolve better individuals from
good parents.
To demonstrate the eect of standard one-point
crossover, we present a simple example. The chromosomes
of the parent individuals A and B are listed in Table 1 and
the according binary trees are shown in Figure 2. As a cut
point for the crossover, we choose the gap between gene 6
and 7. The chromosome parts from locus 7 to the right end
of the chromosome are exchanged between individuals A and
B. This results in two new trees (i.e., individual A and B )
which are displayed in Figure 3. Evidently, the new generation of trees dier considerably from their parents.
The notion is to introduce a problem-inspired crossover
such that the overall tree structure is preserved while only local parts of the subband trees are altered [11]. Specifically,
one node in each individual (i.e., subband tree) is chosen at
random, then the according subtrees are exchanged between
the individuals. In our example, the candidate nodes for the
crossover are node 2 in individual A and node 10 in individual B. The tree crossover produces a new pair of descendants A and B which are displayed in Figure 4. Compared
to the standard crossover operator, tree crossover moderately
alters the structure of the parent individuals and generates
new ones.
6.
RANDOM SEARCH
The random generation of WP decompositions is not

straightforward due to the quadtree structure. If we consider
4
8
5
9
10
12
11
13
14
15
(a) Individual A .
1
2
4
8
5
9
10
6
11
12
7
13
14
15
(b) Individual B .
Figure 3: Individuals after conventional one-point crossover.
a 0/1 string as an encoded quadtree (Section 4), we could

obtain random WP decomposition just by creating random
0/1 strings of a given length. An obvious drawback is that
this method acts in favor of small quadtrees. We assume that
810

1
2
10
12
11
13
14
15
(a) Individual A .

1
Figure 5: Barbara.
2
3
5
4
8
10
6
11
12
7
13
14
15
(b) Individual B .
Figure 4: Individuals after tree crossover.
the root node always exists and that it is on level l = 0.

This is a useful assumption because we need at least one
wavelet decomposition. The probability to obtain a node at
level l is (1/2)l . Due to the rapidly decreasing probabilities,
the quadtrees will be rather sparse.
Another admittedly theoretical approach would be to assign a uniform probability to all possible quadtrees. Then,
this set is sampled for WP decompositions. Some simple considerations will show that in this case small quadtrees are
excluded from evaluation. In the following, we will calculate the number A(k) of trees with nodes on equal or less
than k levels. If k = 0, then we have A(0) := 1 because
there is only the root node on level l = 0. For A(k), we
obtain the recursion A(k) = [1 + A(k 1)]4 because we
can construct quadtrees of height equal to or less than k by
adding a new root node to trees of height k 1. The number of quadtrees B(k) of height k is given by B(0) := 1 and
B(k) = A(k) A(k 1), k 1. From the latter argument, we
see that the number of quadtrees of height B(k) increases exponentially. Consequently, the number of trees of low height
is diminishing and hence, when uniformly sampling the set
of quadtrees, they are almost excluded from the evaluation.
With image compression in mind, we are interested in
trees of low height because trees with a low number of nodes
and a simple structure require less resources when encoded
into a bitstream. Therefore, we have adopted the RS approach
of the first paragraph with a minor modification. We require
that the approximation subband is at least decomposed down
to level 4 because it contains usually a considerable amount
of the overall signal energy.
Similar to the GA, we can apply the RS using PSNR instead of cost functions to evaluate WP decompositions. Using a RS as discussed above with a decomposition depth of
at least 4 for the approximation subband, we generate 4000
almost unique samples of WP decompositions and evaluate
the corresponding PSNR. The WP decomposition with the
highest PSNR value is recorded. We have repeated the single
RS runs at least 90 times. The best three results in decreasing order and the least result of a single RS run for the image
Barbara are presented as follows: 24.648, 24.6418, 24.6368,
. . . , 24.4094.
If we compare the results of the RS to those obtained by
NBB with cost function weak l1 norm (PSNR 25.47), we realize that the RS is about 1 dB below the NBB algorithm. To
increase the probability of a high quality result of the RS, a
drastic increase of the sample size is required, which again
would result in a tremendous increase of the RS runtime.
7.
CORRELATION OF COST FUNCTIONS

AND IMAGE QUALITY
Our experiments are based on a test library of images with

a broad spectrum of visual features. In this work, we present
the results for the well-known image Barbara. The considerable amount of texture in the test picture demonstrates the
superior performance of the WP approach in principle.
The output of the NBB, GA, and RS is a WP decomposition. WPs are a generalization of the pyramidal decomposition. Therefore, we apply an algorithm similar to SPIHT
which exploits the hierarchical structure of the wavelet coecients [21] (SMAWZ). SMAWZ uses the foundations of
SPIHT, most importantly the zero-tree paradigm, and adapts
them to WPs.
Cost functions are the central design element in the NBB
algorithm. The working hypothesis of (additive and nonadditive) cost functions is that a WP decomposition with an optimal cost-function value provides also a (sub-) optimal r/d
performance. The optimization of WP decompositions via
cost functions is an indirect strategy. Therefore, we compare
the results of the EC methods to that of the NBB algorithm
by generating scatter plots. In these plots, we simultaneously

26
811
25.4
25
25.2
24
23
25
21
PSNR
PSNR
22
20
19
24.6
18
17
16
15
24.8
24.4
3
5
6
7
8
9
Coifman-Wickerhauser entropy
10
11
Random WPs
Figure 6: Correlation between Coifman-Wickerhauser entropy and

PSNR.
24.2
3.48 3.485 3.49 3.495 3.5 3.505 3.51 3.515 3.52 3.525 3.53
Coifman-Wickerhauser entropy
NBB
RS
GA: TS (t = 2)
GA: LRK ( = 0.9)

Figure 7: Correlation between Coifman-Wickerhauser entropy and

PSNR for WP decompositions obtained by NBB, RS, and GA.
provide for each WP decomposition the information about

the cost-function value and the image quality (PSNR).
Figure 6 displays the correlation of the nonadditive costfunction Coifman-Wickerhauser entropy and the PSNR. For
the plot, we generated 1000 random WP decompositions and
calculated the value of the cost function and the PSNR after
a compression with 0.1 bpp. Note that WP decompositions
with the same decomposition level of the approximation subband are grouped into clouds.
8.
QUALITY OF THE NBB ALGORITHM WITH RESPECT

TO COST-FUNCTION OPTIMIZATION
The basic idea of our assessment of the NBB algorithm is to

use the GA to evolve WP decompositions by means of costfunction optimization. Therefore, we choose some nonadditive cost functions and compute WP decompositions with
the NBB algorithm, a GA, and a RS. For each cost function,
we obtain a collection of suboptimal WP decompositions.
We calculate the PSNR for each of the WP decompositions
and generate scatter plots (PSNR versus cost-function value).
The comparison of the NBB, GA, and RS results provide surprising insight into the intrinsic processes of the NBB algorithm.
We apply the GA and RS as discussed in Sections 5 and 6,
using the nonadditive cost-functions Coifman-Wickerhauser
entropy, weak l1 norm, and Shannon entropy to optimize
WP decompositions. The GA as well as the RS generate and
evaluate 104 WP decompositions. The image Barbara is decomposed according to the output of NBB, GA, and RS and
compressed to 0.1 bpp. Afterwards, we determine the PSNR
of the original and the decompressed image.
In Figure 7, we present the plot for the correlation between the Coifman-Wickerhauser entropy and PSNR for
NBB, GA, and RS. The WP decomposition obtained by the
NBB algorithm is displayed as a single dot. The other dots
represent the best individual found either by a RS or a GA

run. With the Coifman-Wickerhauser entropy, we notice a
defect in the construction of the cost function. Even though
the GA and RS provide WP decompositions with a costfunction value less than that of the NBB, the WP decomposition of the NBB is superior in terms of image quality. As a
matter of fact, the NBB provides suboptimal WP decompositions with respect to the Coifman-Wickerhauser entropy.
The correlation between weak l1 norm and PSNR is displayed in Figure 8. Similar to the scatter-plot of the CoifmanWickerhauser entropy, the WP decomposition of the NBB is
an isolated dot. But this time, the GA and the RS are not able
to provide a WP decomposition with a cost-function value
less than the cost-function value of the NBB-WP decomposition.
Even more interesting is the cost-function Shannon entropy (Figure 9). Similar to the Coifman-Wickerhauser entropy, the Shannon entropy provides WP decompositions
with a cost-function value lower than the NBB. In the upper right of the figure, there is a singular result of the GA
using TS. This WP decomposition has an even higher costfunction value than the one of the NBB but is superior in
terms of PSNR.
In general, the GA employing LRK provides better results
than the GA using TS concerning the cost-function values.
Within the GA-LRK results, there seems to be a slight advantage for the tree crossover. In all three figures, the GA-LRK
with and without tree crossover is clearly ahead of the RS.
This is evidence for a more ecient optimization process of
the GA compared to RS.
In two cases (Figures 7 and 9), we observe the best costfunction values for the GA- and the RS-WP decomposition.
Nevertheless, the NBB-WP decomposition provides higher
image quality with an inferior cost-function value. The singular result for the GA of Figure 9 is yet another example
812

RS, and GA fail to consistently predict the image quality, that
is, a lower cost-function value does not assert a higher image
quality.
25.6
25.4
PSNR
25.2
9.
25
24.8
24.6
24.4
24.2
400000
450000
NBB
RS
GA: TS (t = 2)
500000
550000
Weak l norm
600000
650000
GA: LRK ( = 0.9)

Figure 8: Correlation between weak l1 norm and PSNR for WP

decompositions obtained by NBB, RS, and GA.
25.2
25.1
25
24.9
PSNR
24.8
24.7
24.6
24.5
24.4
24.3
24.2
0.0071 0.0072 0.0073 0.0074 0.0075 0.0076 0.0077 0.0078
Shannon entropy
NBB
RS
GA: TS (t = 2)
GA: LRK ( = 0.9)

Figure 9: Correlation between Shannon entropy and PSNR for WP

decompositions obtained by NBB, RS, and GA. The results of GA:
TS (t = 2), tree crossover are not displayed due to zooming.
SUMMARY
The NBB algorithm for WP decomposition provides, due

to the construction only, suboptimal cost-function values as
well as suboptimal image quality. We are interested in an assessment of the quality of the NBB results.
We have adapted a GA and a RS to the problem of
WP-decomposition optimization by means of additive and
nonadditive cost functions. For the GA, a problem-inspired
crossover operator was implemented to reduce the disruptive
eect on decomposition trees when recombining the chromosomes of WP decompositions. Obviously, the computational complexity of RS and GA are exorbitantly higher than
that of the NBB algorithm. But the RS and GA are in this case
helper applications for the assessment of the NBB algorithm.
We compute WP decompositions with the NBB algorithm, the RS, and GA. The central tool of analysis is the correlation between cost-function value and the corresponding
PSNR of WP decompositions which we visualize with scatter
plots. The scatter plots reveal the imperfect correlation between cost-function value and image quality for WP decompositions for all of the presented nonadditive cost functions.
This also holds true for many other additive and nonadditive
cost functions. We observed that the NBB-WP decomposition provided excellent image quality even though the corresponding cost-function value was sometimes considerably
inferior compared to the results of the RS and GA. Consequently, our results revealed defects in the prediction of image quality by means of cost functions.
With the RS and GA at hand, we applied minor modifications to these algorithms. Instead of employing cost functions for optimizing WP decompositions, we used the PSNR
as a fitness function which resulted in a further increase of
computational complexity because each evaluation of a WP
decomposition requires a full compression and decompression step. Hereby, we directly optimize the image quality.
This direct approach of optimizing WP decomposition with
GA and RS, employing PSNR as a fitness function, requires
further improvement to exceed the performance of the NBB.
REFERENCES
for this phenomenon. As a result, the correlation of the costfunction value and the PSNR, as indicated in all three scatter plots, is imperfect. (In the case of perfect correlation, we
would observe a line starting in the right and descending to
the left.)
The NBB algorithm generates WP decompositions according to split and combine decisions based on costfunction evaluations. In contrast, RS and GA generate a complete WP decomposition and the cost-function value is computed afterwards. The overall cost-function values of NBB,
[1] W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image Data

Compression Standard, Van Nostrand Reinhold, New York,
NY, USA, 1993.
[2] D. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice, Kluwer Academic
Publishers, Boston, Mass, USA, 2002.
[3] J. R. Goldschneider and E. A. Riskin, Optimal bit allocation
and best-basis selection for wavelet packets and TSVQ, IEEE
Trans. Image Processing, vol. 8, no. 9, pp. 13051309, 1999.
[4] F. G. Meyer, A. Z. Averbuch, and J.-O. Stromberg, Fast adaptive wavelet packet image compression, IEEE Trans. Image
[5] R. Oktem,
L. Oktem,
and K. Egiazarian, Wavelet based image
compression by adaptive scanning of transform coecients,
Journal of Electronic Imaging, vol. 2, no. 11, pp. 257261, 2002.
[6] Z. Xiong, K. Ramchandran, and M. T. Orchard, Wavelet
packet image coding using space-frequency quantization,
IEEE Trans. Image Processing, vol. 7, no. 6, pp. 892898, 1998.
[7] A. Said and W. A. Pearlman, A new, fast, and ecient image
codec based on set partitioning in hierarchical trees, IEEE
Trans. Circuits and Systems for Video Technology, vol. 6, no. 3,
pp. 243250, 1996.
[8] K. Ramchandran and M. Vetterli, Best wavelet packet bases
in a rate-distortion sense, IEEE Trans. Image Processing, vol.
2, no. 2, pp. 160175, 1993.
[9] N. M. Rajpoot, R. G. Wilson, F. G. Meyer, and R. R. Coifman, A new basis selection paradigm for wavelet packet image coding, in Proc. International Conference on Image Processing (ICIP 01), pp. 816819, Thessaloniki, Greece, October
2001.
[10] T. Hopper, Compression of gray-scale fingerprint images,
in Wavelet Applications, H. H. Szu, Ed., vol. 2242 of SPIE Proceedings, pp. 180187, Orlando, Fla, USA, 1994.
[11] T. Schell and A. Uhl, Customized evolutionary optimization
of subband structures for wavelet packet image compression,
in Advances in Fuzzy Systems and Evolutionary Computation,
N. Mastorakis, Ed., pp. 293298, World Scientific Engineering
Society, Puerto de la Cruz, Spain, February 2001.
[12] T. Schell and A. Uhl, New models for generating optimal
wavelet-packet-tree-structures, in Proc. 3rd IEEE Benelux
Signal Processing Symposium (SPS 02), pp. 225228, IEEE
Benelux Signal Processing Chapter, Leuven, Belgium, March
2002.
[13] R. R. Coifman and M. V. Wickerhauser, Entropy based algorithms for best basis selection, IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 713718, 1992.
[14] C. Taswell, Satisficing search algorithms for selecting nearbest bases in adaptive tree-structured wavelet transforms,
IEEE Transactions on Signal Processing, vol. 44, no. 10, pp.
24232438, 1996.
[15] M. V. Wickerhauser, Adapted Wavelet Analysis from Theory to
Software, A. K. Peters, Wellesley, Mass, USA, 1994.
[16] C. Taswell, Near-best basis selection algorithms with nonadditive information cost functions, in Proc. IEEE International Symposium on Time-Frequency and Time-Scale Analysis
(TFTS 94), M. Amin, Ed., pp. 1316, IEEE Press, Philadelphia, Pa, USA, October 1994.
MIT Press, Ann Arbor, Mich, USA, 1975.
[18] T. Schell and S. Wegenkittl, Looking beyond selection probabilities: adaption of the 2 measure for the performance analysis of selection methods in GA, Evolutionary Computation,
vol. 9, no. 2, pp. 243256, 2001.
[19] J. E. Baker, Adaptive selection methods for genetic algorithms, in Proc. 1st International Conference on Genetic Algorithms and Their Applications, J. J. Grefenstette, Ed., pp. 101
111, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, July
1985.
[20] T. Schell, Evolutionary optimization: selection schemes, sampling and applications in image processing and pseudo random number generation, Ph.D. thesis, University of Salzburg,
Salzburg, Austria, 2001.
[21] R. Kutil, A significance map based adaptive wavelet zerotree
codec (SMAWZ), in Media Processors 2002, S. Panchanathan,
V. Bove, and S. I. Sudharsanan, Eds., vol. 4674 of SPIE Proceedings, pp. 6171, San Jose, Calif, USA, January 2002.
813
Thomas Schell received his M.S. degree in

computer science from Salzburg University,
Austria and from the Bowling Green State
University, USA and a Ph.D. from Salzburg
University. Currently, he is with the Department of Scientific Computing as a Research
and Teaching Assistant at Salzburg University. His research focuses on evolutionary
computing and signal processing, especially
image compression.
Andreas Uhl received the B.S. and M.S. degrees (both in mathematics) from Salzburg
University and he completed his Ph.D. on
applied mathematics at the same university.
He is currently an Associate Professor with
tenure in computer science aliated with
the Department of Scientific Computing,
and with the Research Institute for Software
Technology, Salzburg University. He is also
a part-time lecturer at the Carinthia Tech
Institute. His research interests include multimedia signal processing (with emphasis on compression and security issues), parallel
and distributed processing, and number theoretical methods in
numerics.

On the Use of Evolutionary Algorithms to Improve

the Robustness of Continuous Speech Recognition
Systems in Adverse Conditions
Sid-Ahmed Selouani
Secteur Gestion de lInformation, Universite de Moncton, Campus de Shippagan, 218 boulevard J.-D.-Gauthier,
Shippagan, Nouveau-Brunswick, Canada E8S 1P6
Email: selouani@umcs.ca
Douglas OShaughnessy
INRS-Energie-Materiaux-Telecommunications, Universite du Quebec, 800 de la Gauchetière Ouest,
place Bonaventure, Montreal, Canada H5A 1K6
Email: dougo@inrs-telecom.uquebec.ca
Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech
recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT) in the melfrequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The
enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique,
when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe
interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to 4 dB. We also showed
the eectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
Keywords and phrases: speech recognition, genetic algorithms, Karhunen-Loève transform, hidden Markov models, robustness.
1.
INTRODUCTION
Continuous speech recognition (CSR) systems remain faced

with the serious problem of acoustic condition changes.
Their performance often degrades due to unknown adverse conditions (e.g., due to room acoustics, ambient noise,
speaker variability, sensor characteristics, and other transmission channel artifacts). These speech variations create
mismatches between the training data and the test data. Numerous techniques have been developed to counter this in
three major areas [1].
The first area includes noise masking [1], spectral and
cepstal substraction [2], and the use of robust features
[3]. Robust feature analysis consists of using noise-resistant
parameters such as auditory-based features, mel-frequency
cepstral coecients (MFCC) [4], or techniques such as relative spectral (RASTA) methodology [5]. The second type of
method refers to the establishment of compensation models
for noisy environments without modification to the speech
signal. The third field of research is concerned with distance
and similarity measurements. The major methods of this
field are founded on the principle to find a robust distorsion
measure that emphasizes the regions of the spectrum that are

less influenced by noise [6].
Despite these eorts to address robustness, adapting to
changing environments remains the major obstacle to speech
recognition in practical applications. Investigating innovative strategies has become essential to overcome the drawbacks of classical approaches. In this context, evolutionary
algorithms (EAs) are robust solutions, and they are useful
to find good solutions to complex problems (artificial neural
networks topology or weights for instance) and to avoid local minima [7]. Applying artificial neural networks, Spalanzani [8] showed that recognition of digits and vowels can
be improved by using genetically optimized initialization of
weights and biases. In this paper, we propose an approach
which can be viewed as a signal transformation via a mapping operator using a mel-frequency space decomposition
based on the Karhunen-Loève transform (KLT) and a genetic algorithm (GA) with a real-coded encoding (a part of
EAs). This transformation attempts to adapt hidden Markov
model-based CSR systems for adverse conditions. The principle consists of finding in the learning phase the principal
axes generated by the KLT and then optimizing them for the
Evolutionary Algorithms for Noisy Speech Recognition
815
projection of noisy data by genetic operators. The aim is to

provide projected noisy data that are as close as possible to
clean data.
This paper is organized as follows. Section 2 describes the
basis of our proposed hybrid KLT-GA enhancement method.
Section 3 describes the model linking the KLT to the evolution mechanism, which leads to a robust representation
of noisy data. Then, Section 4 describes the database, the
platform used in our experiments and the evaluation of the
proposed KLT-GA-based recognizer in a noisy car environment and in a telephone channel environment. This section
includes the comparison of KLT-GA processed recognizers to
a baseline CSR system in order to evaluate performance. Finally, Section 5 concludes with a perspective of this work.
2.
OVERALL STRUCTURE OF THE KLT-GA-BASED

ROBUST SYSTEM
2.1. General framework

CSR systems based on statistical models such as hidden
Markov models (HMM) automatically recognize speech
sounds by comparing their acoustic features with those determined during training [9]. A bayesian statistical framework underlies the HMM-speech recognizer. The development of such a recognizer can be summarized as follows. Let
w be a sequence of phones (or words), which produces a sequence of observable acoustic data o, sent through a noisy
transmission channel. In our study, telephone speech is corrupted by additive noise. The recognition process aims to
provide the most likely phone sequence w given the acoustic
data o. This estimation is performed by maximizing a posteriori (MAP) the p(w | o) probability:
w = argmax p(w | o) = argmax p(o | w)p(w),
w
(1)
where is the set of all possible phone sequences, p(w) is the

prior probability, determined by the language model, that the
speaker utters w, and p(o | w) is the conditional probability
that the acoustic chanel produces the sequence o. Let be
the set of models used by the recognizer to decode acoustic
parameters through the use of the MAP. Then (1) becomes
w = argmax p(o | w, )p(w).
w
2.2.
Cepstral acoustic features
The cepstrum is defined as the inverse Fourier transform of

the logarithm of the short-term power spectrum of the signal. The use of a logarithmic function allows deconvolution
of the vocal tract transfer function and the voice source. Consequently, the pulse sequence corresponding to the periodic
voice source reappears in the cepstrum as a strong peak in
the frequency domain. The derived cepstral coecients are
commonly used to describe the short-term spectral envelope
of a speech signal. The computation of MFCCs requires the
selection of M critical bandpass filters that roughly approximate the frequency response of the basilar membrane in the
cochlea of the inner ear [4]. A discrete cosine transform, Cn ,
is applied to the output of M filters, Xk . These filters are triangular, cover the 1566844 Hz frequency range, and are spaced
on the mel-frequency scale. These filters are applied to the log
of the magnitude spectrum of the signal, which is estimated
on a short-time basis. Thus
(2)
Cn =
The mismatch between the training and the testing environments leads to a worse estimate for the likelihood of o given
and thus degrades CSR performance. Reducing this mismatch should increase the correct recognition rate. The mismatch can be viewed by considering the signal space, the feature space, or the model space. We are concerned with the
feature space, and consider a transformation T that maps
into a transformed feature space. Our approach is to find T
and the phone sequence w that maximize the joint likelihood of o and w given :
[T , w ] = argmax p(o | w, T, )p(w).
We propose a pseudojoint maximization over w and T, where

the typical conventional HMM-based technique is used to estimate w, while an EA-based technique enhances noisy data
iteratively by keeping the noisy features as close as possible to
the clean data. This EA-based transformation aims to reduce
the mismatch between training and operating conditions by
giving the HMM the ability to recall the training conditions.
As is shown in Figure 1, the idea is to manipulate the axes
generating the feature representation space to achieve a better robustness on noisy data. MFCCs serve as acoustic features. A Karhunen-Loève decomposition in the MFCC domain allows obtaining the principal axes that constitute the
basis of the space where noisy data is represented. Then, a
population of these axes is created (corresponding to individuals in the initialization of the evolution process). The
evolution of the individuals is performed by EAs. The individuals are evaluated via a fitness function by quantifying,
through generations, their distance to individuals in a noisefree environment. The fittest individual (best principal axes)
is used to project the noisy data in its corresponding dimension. Genetically modified MFCCs and their derivatives are
finally used as enhanced features for the recognition process.
(3)
M

k=1
Xk cos
&
n
(k 0.5) ,
M
n = 1, 2, . . . , N,
(4)
where N is the number of the cepstral coecients, M is the

analysis order, and Xk , k = 1, 2, . . . , M = 20, represents the
log-energy output of the kth filter.
2.3.
KLT in the mel-frequency domain
In order to reduce the eects of noise on ASR, many methods propose to decompose the vector space of the noisy signal
into a signal-plus-noise subspace and a noise subspace [10].
We remove the noise subspace and estimate the clean signal
from the remaining signal space. Such a decomposition applies the KLT to the noisy zero-mean normalized data.
816
MFC analysis
Clean speech
Enhanced
MFCC
Individual and genetic

operators
a22
a11
a33
KLT
S2
S1
decomposition
a13
Recognition
S3
a23
HMM
MFC analysis
Noisy speech
Figure 1: General overview of the KLT-EA-based CSR robust system.
If we apply such a decomposition over the noisy zero = [C 1 , C 2 , . . . , C N ]T with

mean normalized MFCC vector C
has a symmetric nonnegative autothe assumption that C
with a rank r N, then C
T C]
correlation matrix R = [C
can be represented as a linear combination of eigenvectors
1 , 2 , . . . , r , which correspond to eigenvalues 1 2
can be calculated using
r 0, respectively. That is, C
the following orthogonal transformation:
=
C
r

k=1
k k ,
k = 1, . . . , r,
(5)
where the coecients k (principal components) are given

in the space generated by the rby the projection of C
eigenvector basis. Given that the magnitudes of low-order
eigenvalues are higher than for the high-order ones, the effect of the noise on the low-order eigenvalues is proportionately less than that for high-order ones. Thus, a linear estimation of the clean vector C is performed by projecting the
noisy vectors on the space generated by principal components weighted by a function Wk , which applies strong attenuation over higher-order eigenvectors depending on the
noise variance [10]. The enhanced MFCCs are then given by
=
C
r

k=1
Wk k k ,
k = 1, . . . , r.
(6)
Various methods can find the adequate weighting function,

particularly in the case of signal subspace decomposition
[10]. The optimal order r fixing the beginning of the strong
attenuation must be determined. In our new approach, GAs
determine optimal principal components. No assumptions
need to be made. Optimization is achieved when vectors
1 , 2 , . . . , N , which do not correspond necessarily to the
eigenvectors, minimize the Euclidean distance between C

Gen , are
and C. The genetically enhanced MFCCs, C
Gen =
C
N

k=1
k k ,
k = 1, . . . , N.
(7)
Determining an optimal r is not needed since the GA considers vectors 1 , 2 , . . . , N as the fittest individuals for the
complete space dimension N. This process can be regarded
as the mapping transform, T , of (3).
3.
MODEL DESCRIPTION AND EVOLUTION
The use of GAs requires resolution of six fundamental issues:

the chromosome (or solution) representation, the selection
function, the genetic operators making up the reproduction
function, the creation of the initial population, the termination criteria, and the evaluation function [11, 12]. The GA
maintains and manipulates a family or population of solutions (the 1 , 2 , . . . , N vectors in our case) and implements
a survival of the fittest strategy in its search for better solutions.
3.1.
Solution representation
A chromosome representation describes each individual in

the population. It is important since the representation
scheme determines how the problem is structured in the
GA and also determines the adequate genetic operators to
use [13]. For our application, the useful representation of
an individual or chromosome for function optimization involves genes or variables from an alphabet of floating-point
numbers with values within the variables upper and lower
bounds (resp., +1 and 1). Michalewicz [14] has done extensive experimentation comparing real-valued and binary GAs,
817
and has shown that real-valued representation oers higher

precision with more consistent results across replications.
1. Fix g = U(0, 1), uniform random number

2. Compute fit[X] and fit[Y ], fitness of X and Y
3. If fit[X] > fit[Y ]

Then X = X + g(X Y ) and Y = X

Estimate
' feasibility of X :
1 if ai xi bi i

(X ) =
0 otherwise

xi components of X , i = 1, . . . , N

4. If (X ) = 0
Then generate new g; goto 2
5. If all individuals reproduced then Stop
else goto 1
3.2. Selection function

Stochastic selection is used to keep search strategies simple while allowing adaptivity. The selection of individuals to
produce successive generations plays an extremely important
role in GAs. A common selection approach assigns a probability of selection, P j , to each individual, j, based on its fitness value. Various methods exist to assign probabilities to
individuals; we use the normalized geometric ranking [15].
This method defines P j for each individual by
P j = q (1 q)s1 ,
(8)
q
,
1 (1 q)P
(9)
where
q =
Algorithm 1: The heuristic crossover used in the CSR robust system.
where the function f (Gen) is given by

(
where q is the probability of selecting the best individual, s

the rank of the individual (1 being the best), and P the population size.
3.3. Genetic operators
The basic search mechanism of the GA is provided by two
types of operators: crossover and mutation. Crossover transforms two individuals into two new individuals, while mutation alters one individual to produce a single solution. A
float representation of the parents is denoted by X and Y . At
the end of the search, the fittest individual survives and is retained as an optimal KLT axis in its corresponding rank of
1 , 2 , . . . , N vectors.
f (Gen) = u2
Gen
1
Genmax
))t
(11)
where u1 , u2 are uniform random numbers between (0, 1), t a

shape parameter, Gen the current generation, and Genmax the
maximum number of generations. The multi-nonuniform
mutation generalizes the application of the nonuniform mutation operator to all the components of the parent X. The
main advantage of this operator is that the alteration is distributed on all individual components which lead to the extension of the search space and then permit to deal with any
kind of noise.
3.3.1 Crossover
3.4.
Crossover operators combine information from two parents

and transmit it to each ospring. In order to avoid the extension of the exploration domain of the best solution, we
preferred to use a crossover that utilizes fitness information,
that is, a heuristic crossover [15]. Let ai and bi be the lower
and upper bound, respectively, of each component xi representing a member of the population (X or Y ). This operator
produces a linear interpolation of X and Y . New individuals
X and Y (children) are created according Algorithm 1.
The GA must search all the axes generated by the KLT of

the mel-frequency space (that make the noisy MFCCs if
they are projected into these axes) to find the closest to the
clean MFCC. Thus, evolution is driven by a fitness function
defined in terms of a distance measure between the noisy
MFCC projected on a given individual (axis) and the clean
MFCC. The fittest individual is the axis which corresponds to
the minimum of that distance. The distance function applied
to cepstral (or other voice representations) refers to spectral
distorsion measures and represents the cost in a classification
system of speech frames. For two vectors C and C representing two frames [6], each with N components, the geometric
distance is defined as
3.3.2 Mutation
Mutation operators tend to make small random changes in
an attempt to explore all regions of the solution space [16].
The principle of a nonuniform mutation used in our application consists of randomly selecting one component, xk , of
an individual and setting it equal to a nonuniform random
number,1 xk :
x + b x f (Gen)
k
k
k

xk =
xk ak + xk f (Gen)

1 Otherwise,
if u1 < 0.5,
if u1 0.5,
the original values of components are maintained.
(10)
Evaluation function
d C, C =
N

Ck C k
l
)1/l
(12)
k=1
For simplicity, the Euclidean distance is considered (l = 2),

which has been a valuable measure for both clean and noisy
speech [6, 17]. Figure 2 gives for the first four best axes the
evolution of their fitness (distorsion measure) through 300
is used as a distance measure
generations. Note that d(C, C)
because the evaluation function must be maximized.
818

0
0
0.5
Second axis fitness
First axis fitness
1
1.5
2
2.5
3
4
100
200
3.5
300
100
0.5
0.5
1.5
2
2
2.5
100
300
1.5
2.5
3.5
200
Generation
Fourth axis fitness
Third axis fitness
Generation
200
300
Generation
3.5
100
200
300
Generation
Figure 2: Evolution of the performances of the best individual during 300 generations. Only the four first axes are considered among the
twelve.
3.5. Initialization and termination

The ideal, zero-knowledge assumption starts with a population of completely random axes. Another typical heuristic,
used in our system, initializes the population with a uniform distribution in a default set of known starting points
described by the boundaries (ai , bi ) for each axis component. The GA-based search ends when the population gets
homogeneity in performance (when children do not surpass
their parents), converges according to the Euclidean distorsion measure, or is terminated by the user if the number of
maximum generations is reached. Finally, the evolution process can be summarized in Algorithm 2.
4.
EXPERIMENTS
4.1. Speech material

The following experiments used the TIMIT database [18],
which contains broadband recordings of a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8
major dialect regions of the United States, each reading 10

phonetically rich sentences. To simulate a noisy environment, car noise was added artificially to the clean speech.
To study the eect of such noise on the recognition accuracy
of the CSR system that we evaluated, the reference templates
for all tests were taken from clean speech. The training set is
composed of 1140 sentences (114 speakers) of dr1 and dr2
TIMIT subdirectories. On the other hand, the dr1 subset of
the TIMIT database, composed of 110 sentences, was chosen
to evaluate the recognition system.
In a second set of experiments, and in order to study
the impact of telephone channel degradation on recognition
accuracy of both baseline and enhanced CSR systems, the
NTIMIT database was used [19]. It was created by transmitting speech from the TIMIT database over long-distance telephone lines. Previous work has demonstrated that telephone
line use increases the rate of recognition errors; for example,
Moreno and Stern [20] report a 68% error rate by using a
version of SPHINX-II [21] as CSR system, TIMIT as training
database, and NTIMIT database, for the test.
819
Fix the number of generations Genmax and boundaries of axes

Generate for each principal KLT component a population of axes
For Genmax generation Do
For each set of components Do
Project noisy data using KLT axes
Evaluate global Euclidean distance for clean data
End For
Select and Reproduce
End For
Project noisy data onto space generated by the best individuals
Algorithm 2: The evolutionary search technique for best KLT axes.
4.2. CSR platform

In order to test the recognition of continuous speech data
enhanced as described above, the HTK-based speech recognizer [22] was used. HTK is an HMM-based toolkit used for
isolated or continuous whole-word-based recognition systems. The toolkit supports continuous-density HMMs with
any number of state and mixture components. It also implements a general parameter-tying mechanism which allows
the creation of complex model topologies. Twelve MFCCs
were calculated using a 30-millisecond hamming window
advanced by 10 milliseconds for each frame. To do this, an
FFT calculates a magnitude spectrum for each frame, which
is then averaged into 20 triangular bins arranged at equal
mel-frequency intervals. Finally, a cosine transform is applied to such data to calculate the 12 MFCCs which form
a 12-dimensional (static) vector. This static vector is then
expanded after enhancement to produce a 36-dimensional
(static + first and second derivatives: MFCC D A) vector
upon which the HMMs, that model the speech subword
units, were trained. Regarding the used frame length, the
1140 sentences of dr1 and dr2 TIMIT subsets provided
342993 frames that were used for the training. The baseline system used a triphone Gaussian mixture HMM system. Triphones were trained through a tree-based clustering
method to deal with unseen context. A set of binary questions about phonetic contexts is built; the decision tree is
constructed by selecting the best question from the rule set
at each node [23].
4.3. Results and discussion
4.3.1 GA parameters
A population of 150 individuals is generated for each k and
evolves during 300 generations. The values for the GA parameters given in Table 1 were selected after extensive crossvalidation experiments and were shown to perform well with
all data. The maximum number of generations needed and
the population size are well adapted to our problem since
no improvement was observed when these parameters were
increased. At each generation, the best individuals are retained to reproduce. In the end of the evolution process, the
best individuals of the best population are considered as the
Table 1: Values of the parameters used in the GA.

Parameter
Parameter value
Number of generations
Population size
Probability of selecting the best, q
Heuristic crossover rate
Multi-nonuniform mutation rate
Number of runs
Number of frames
Boundaries [ai , bi ]
300
150
0.08
0.25
0.06
50
114331
[1.0, +1.0]
optimized KLT axes. This method is used by Houk et al. in

[15]. For this purpose, data sets are composed of 114331
frames extracted from the TIMIT training subset and corresponding noisy frames extracted from the noisy TIMIT and
NTIMIT databases.
4.3.2
CSR under additive car noise environment
Experiments were done using the noisy version of TIMIT

at dierent values of SNR, from 16 dB to 4 dB. Figure 3
shows that using the KLT-GA-based optimization to enhance
the MFCCs that were used for recognition with N-mixture
Gaussian HMMs for N = 1, 2, 4, 8 with triphone models
leads to a higher word recognition rate. The CSR system
including the KLT-GA-processed MFCCs performs significantly better than its MFCC D A- and KLT-MFCC D Abased CSR systems, for low and high noise conditions. The
system which contains enhanced MFCCs achieves 81.67% as
the best word recognition rate (%CW ) for 16-dB SNR and
four Gaussian mixtures. In the same conditions, the baseline
system dealing with noisy MFCCs and the system containing KLT-processed MFCCs achieve, respectively, 73.89% and
77.25%. The increased accuracy is more significant in low
SNR conditions, which attests to the robustness of the approach when acoustic conditions become severely degraded.
For instance, in the 4-dB SNR case, the KLT-GA-MFCCbased CSR system has accuracy higher than KLT-MFCCand MFCC-based CSR systems, respectively, by 12% and
20%. The comparison between KLT- and KLT-GA-processed

90
90
80
80
70
70
% Recognition rate
% Recognition rate
820
60
50
60
50
40
40
30
10
SNR (dB)
15
30
20
10
SNR (dB)
15
20
15
20
Baseline
KLT
(a) 1-mixture.
(b) 2-mixture.
90
90
80
80
% Recognition rate
% Recognition rate
KLT-GA
Baseline
KLT-GA
KLT
70
60
50
40
70
60
50
40
30
10
SNR (dB)
15
20
Baseline
KLT-GA
30
10
SNR (dB)
Baseline
KLT-GA
KLT
KLT
(c) 4-mixture.
(d) 8-mixture.
Figure 3: Percent word recognition performance (%CWrd ) of the KLT- and KLT-GA-based CSR systems compared to the baseline HTK
method (noisy MFCC) using (a) 1-mixture, (b) 2-mixture, (c) 4-mixture, and (d) 8-mixture triphones for dierent values of SNR.
MFCCs shows that the proposed evolutionary approach is

more powerful whatever is the level of noise degradation.
Considering the KLT-based CSR, inclusion of the GA technique raised accuracy by about 11%. Figure 4 plots the
variations of the first four MFCCs for a signal that has been
chosen from the test set. It is clear from the comparison illustrated in this figure that the processed MFCCs, using the
proposed KLT-GA-based approach, are less variant than the
noisy MFCCs and closer to the original ones.
4.3.3 Speech under telephone channel degradation

Extensive experimental studies characterized the impairments induced by telephone networks [24]. When speech is
recorded through telephone lines, a reduction in the analysis
bandwidth yields higher recognition error, particularly when
the system is trained with high-quality speech and tested using simulated telephone speech [20]. In our experiments, the
training set (dr1 and dr2 subdirectories of TIMIT) (1140 sentences and 342993 frames) was used to train a set of clean
821
10
20
10
Second MFCC
First MFCC
10
20
10
30
20
50
100
150
Frame number
200
250
50
100
150
Frame number
200
250
50
100
150
Frame number
200
250
20
10
Fourth MFCC
Third MFCC
10
10
10
20
20
30
50
100
150
Frame number
200
250
Figure 4: Comparison between clean, noisy, and enhanced MFCCs represented by solid, dotted, dashed-dotted lines, respectively.
speech models. The dr1 subdirectory of NTIMIT was used

as a test set. This subdirectory is composed of 110 sentences
and 34964 frames. Speakers and sentences used in the test
were dierent than those used in the training phase. For
the KLT- and KLT-GA-based CSR systems, we found that
using the KLT-GA as a preprocessing approach to enhance
the MFCCs that were used for recognition with N-mixture
Gaussian HMMs for N = 1, 2, 4, and 8, using triphone models, led to an important improvement in the accuracy of the
word recognition rate. Table 2 showed that this dierence
can reach 27% for MFCC D A- and KLT-GA-MFCC D Abased CSR systems. Table 2 shows that substitution and insertion errors are considerably reduced when the evolutionary approach is included, which gives more eectiveness to
the CSR system.
5.
CONCLUSION
We have illustrated the suitability of EAs, particularly the

GAs, for an important real-world application by presenting
a new robust CSR system. This system is based on the use

of a KLT-GA hybrid enhancement noise reduction approach
in the cepstral domain in order to get less-variant parameters. Experiments show that the use of the enhanced parameters using such a hybrid approach increases the recognition
rate of the CSR process in highly interfering car noise environments for a wide range of SNRs varying from 16 dB to
4 dB and when speech is submitted to the telephone channel degradation. The approach can be applied whatever the
distorsion of vectors under the condition to identify the fitness function. The front-end of the proposed KLT-GA-based
CSR system does not require any a priori knowledge about
the nature of the corrupting noisy signal, which allows dealing with any kind of noise. Moreover, using this enhancement technique avoids the noise estimation process that requires a speech/nonspeech preclassification, which could not
be accurate for low SNRs. It is also interesting to note that
such a technique is less complex than many other enhancement techniques, which need to either model or compensate
for the noise. However, this enhancement technique requires
822
Table 2: Percentages of word recognition rate (%CWrd ), insertion

rate (%Ins ), deletion rate (%Del ), and substitution rate (%Sub )
of the MFCC D A-, KLT-MFCC D A-, KLT-GA-MFCC D Abased HTK CSR systems using (a) 1-mixture, (b) 2-mixture, (c) 4mixture, and (d) 8-mixture triphone models.
[6]
[7]
(a) %CWrd using 1-mixture triphone models.
%Sub %Del %Ins %CWrd

MFCC D A
82.71
4.27
33.44
13.02
KLT-MFCC D A
77.05
KLT-GA-MFCC D A 54.48
5.11
5.42
30.04
25.42
17.84
40.10
[8]
[9]
(b) %CWrd using 2-mixture triphone models.
[10]

MFCC D A
KLT-MFCC D A
81.25
78.11
3.44
3.81
38.44
48.89
15.31
18.08
4.27
52.40
43.33
[11]
[12]
[13]
(c) %CWrd using 4-mixture triphone models.

78.85
3.75
38.23
17.40
[14]
KLT-MFCC D A
76.27
4.88
5.62
39.54
25.31
18.85
44.69
[15]
MFCC D A
(d) %CWrd using 8-mixture triphone models.

MFCC D A
KLT-MFCC D A
78.02
77.36
3.96
5.37
40.83
34.62
18.02
17.32
6.56
26.46
45.00
[16]
[17]
[18]
a large amount of data in order to find the best individual. Many other directions remain open for further work.
Present goals include analyzing evolved genetic parameters,
evaluating how performance scales with other types of noise
(nonstationary, limited band, etc.).
[19]
[20]
REFERENCES
[1] Y. Gong, Speech recognition in noisy environments: A survey, Speech Communication, vol. 16, no. 3, pp. 261291, 1995.
[2] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoustics, Speech, and Signal
[3] D. Mansour and B. H. Juang, A family of distortion measures
based upon projection operation for robust speech recognition, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.
37, no. 11, pp. 16591671, 1989.
[4] S. B. Davis and P. Mermelstein, Comparison of parametric
representation for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoustics, Speech, and
Signal Processing, vol. 28, no. 4, pp. 357366, 1980.
[5] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, RASTAPLP speech analysis technique, in Proc. IEEE Int. Conf. Acous-
[21]
[22]
[23]
[24]
tics, Speech, Signal Processing, vol. 1, pp. 121124, San Fransisco, Calif, USA, March 1992.
J. Hernando and C. Nadeu, A comparative study of parameters and distances for noisy speech recognition, in Proc. Eurospeech 91, pp. 9194, Genova, Italy, September 1991.
C. R. Reeves and S. J. Taylor, Selection of training data for
neural networks by a Genetic Algorithm, in Parallel Problem
Solving from Nature, pp. 633642, Springer-Verlag, Amsterdam, The Netherlands, September 1998.
A. Spalanzani, S.-A. Selouani, and H. Kabre, Evolutionary
algorithms for optimizing speech data projection, in Genetic
and Evolutionary Computation Conference, p. 1799, Orlando,
Fla, USA, July 1999.
D. OShaughnessy, Speech Communications: Human and Machine, IEEE Press, Piscataway, NJ, USA, 2nd edition, 2000.
Y. Ephraim and H. L. Van Trees, A signal subspace approach
for speech enhancement, IEEE Trans. Speech, and Audio Processing, vol. 3, no. 4, pp. 251266, 1995.
D. E. Goldberg, Genetic Algorithms in Search, Optimization
and Machine Learning, Addison-Wesley, Reading, Mass, USA,
1989.
J. Holland, Adaptation in Natural and Artificial Systems, The
University of Michigan Press, Ann Arbor, Mich, USA, 1975.
L. B. Booker, D. E. Goldberg, and J. H. Holland, Classifier
systems and genetic algorithms, Artificial Intelligence, vol. 40,
no. 1-3, pp. 235282, 1989.
Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, AI series. Springer-Verlag, New York, NY, USA,
1992.
C. R. Houk, J. A. Joines, and M. G. Kay, A genetic algorithm for function optimization: a Matlab implementation,
Tech. Rep. 95-09, North Carolina State University, Raleigh,
NC, USA, 1995.
L. Davis, Ed., The Genetic Algorithm Handbook, chapter 17,
Van Nostrand Reinhold, New York, NY, USA, 1991.
B. H. Juang, L. R. Rabiner, and J. G. Wilpon, On the use
of bandpass liftering in speech recognition, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing, pp. 765768,
Tokyo, Japan, April 1986.
W. M. Fisher, G. R. Doddington, and K. M. Goudie-Marshall,
The DARPA speech recognition research database: specifications and status, in Proc. DARPA Speech Recognition Workshop, pp. 9399, Palo Alto, Calif, USA, February 1986.
C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz,
NTIMIT: A phonetically balanced, continuous speech telephone bandwidth speech database, in Proc. IEEE Int. Conf.
Acoustics, Speech, Signal Processing, vol. 1, pp. 109112, Albuquerque, NM, USA, April 1990.
P. J. Moreno and R. M. Stern, Sources of degradation of
speech recognition in the telephone network, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 109
112, Adelaide, Australia, April 1994.
X. D. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, K. F. Lee, and
R. Rosenfeld, The SPHINX-II speech recognition system: An
overview, Computer, Speech and Language, vol. 7, no. 2, pp.
137148, 1993.
Cambridge University Speech Group, The HTK Book (Version
2.1.1), Cambridge University Group, March 1997.
L. R. Bahl, P. V. de Souza, P. S. Gopalakrishnan, D. Nahamoo,
and M. A. Picheny, Decision trees for phonological rules in
continuous speech, in Proc. IEEE Int. Conf. Acoustics, Speech,
Signal Processing, pp. 185188, Toronto, Canada, May 1991.
W. D. Gaylor, Telephone Voice Transmission. Standards and
Measurements, Prentice-Hall, Englewood Clis, NJ, USA,
1989.

Sid-Ahmed Selouani received his B.E. degree in 1987 and his M.S. degree in 1991,
both in electronic engineering from the
University of Science and Technology of Algeria (U.S.T.H.B). He joined the Communication Langagière et Interaction PersonneSystème (CLIPS) Laboratory of Universite
Joseph Fourier of Grenoble, taking part in
the Algerian-French double degree program
degree in
and then he got a Docteur dEtat
the field of speech recognition in 2000 from the University of Science and Technology of Algeria. From 2000 to 2002, he held a postdoctoral fellowship in the Multimedia Group at the Institut National de Recherche Scientifique (INRS-Telecommunications) in
Montreal. He had teaching experience from 1991 to 2000 in the
University of Science and Technology of Algeria before starting
to work as an Assistant Professor at the Universite de Moncton,
Campus de Shippagan. He is also an Invited Professor at INRSTelecommunications. His main areas of research involve speech
recognition robustness and speaker adaptation by evolutionary
techniques, auditory front-ends for speech recognition, integration
of acoustic-phonetic indicative features knowledge in speech recognition, hybrid connectionist/stochastic approaches in speech recognition, language identification, and speech enhancement.
Douglas OShaughnessy has been a Professor at INRS-Telecommunications (University of Quebec) in Montreal, Canada,
since 1977. For this same period, he has
been an Adjunct Professor in the Department of Electrical Engineering, McGill University. Dr. OShaughnessy has worked as a
Teacher and Researcher in the speech communication field for 30 years. His interests
include automatic speech synthesis, analysis, coding and recognition. His research team is currently working
to improve various aspects of automatic voice dialogues in English
and French. He received his education from the Massachusetts Institute of Technology, Cambridge, MA (B.S. and M.S. degrees in
1972; Ph.D. degree in 1976). He is a Fellow of the Acoustical Society of America (1992) and an IEEE Senior Member (1989). From
1995 to 1999, he served as an Associate Editor for the IEEE Transactions on Speech and Audio Processing, and has been an Associate
Editor for the Journal of the Acoustical Society of America since
1998. Dr. OShaughnessy has been selected as the General Chair of
the 2004 International Conference on Acoustics, Speech and Signal Processing (ICASSP) in Montreal, Canada. He is the author of
the textbook Speech Communications: Human and Machine (IEEE
press, 2000).
823

Evolutionary Techniques for Image Processing a Large

Dataset of Early Drosophila Gene Expression
Alexander Spirov
Department of Applied Mathematics and Statistics and The Center for Developmental Genetics, Stony Brook University,
Stony Brook, NY 11794-3600, USA
The Sechenov Institute of Evolutionary Physiology and Biochemistry, Russian Academy of Sciences, 44 Thorez Avenue,
St. Petersburg 194223, Russia
Email: spirov@kruppel.ams.sunysb.edu
David M. Holloway
Mathematics Department, British Columbia Institute of Technology, Burnaby, British Columbia, Canada V5G 3H2
Chemistry Department, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z1
Email: david holloway@bcit.ca
Received 10 July 2002 and in revised form 1 December 2002
Understanding how genetic networks act in embryonic development requires a detailed and statistically significant dataset integrating diverse observational results. The fruit fly (Drosophila melanogaster) is used as a model organism for studying developmental genetics. In recent years, several laboratories have systematically gathered confocal microscopy images of patterns of
activity (expression) for genes governing early Drosophila development. Due to both the high variability between fruit fly embryos
and diverse sources of observational errors, some new nontrivial procedures for processing and integrating the raw observations
are required. Here we describe processing techniques based on genetic algorithms and discuss their ecacy in decreasing observational errors and illuminating the natural variability in gene expression patterns. The specific developmental problem studied is
anteroposterior specification of the body plan.
Keywords and phrases: image processing, elastic deformations, genetic algorithms, observational errors, variability, fluctuations.
1.
INTRODUCTION
Functional genomics is an emerging field within biology

aimed at deciphering how the blueprints of the body plan encrypted in DNA become a living, spatially patterned organism. Key to this process is ensembles of control genes acting
in concert to govern particular events in embryonic development. During developmental events, genes encoded in the
DNA are converted into spatial expression patterns on the
scale of the embryo. The genes, and their products, are active
players in regulating this pattern formation. In the first few
hours of fruit fly (Drosophila melanogaster) development, a
network of some 1520 genes establishes a striped pattern of
gene expression around the embryo [1, 2] (Figure 1). These
stripes are the first manifestation of the segments which characterize the anteroposterior (AP) (head-to-tail) organization
of the fly body plan. Similar segmentation events occur in
other animals, including humans. Drosophila research helps
to understand the genetics underlying such processes.
Though Drosophila may be a relatively easy organism
in which to do developmental genetics, there remain many
experimental problems to be resolved. One of these is the

processing of large set of gene expression images in order
to achieve an integrated and statistically significant detailed
view of the segmentation process.
It is not possible to observe all segmentation genes at
once in the same embryo over the duration of patterning.
Single embryos can be imaged for a maximum of three
segmentation genes. Embryos are killed in the fixing process prior to imaging. Therefore, data sets integrated from
multiple embryos, stained for the variety of segmentation
genes, and over the patterning period, are necessary for
gaining a complete picture of segmentation dynamics. In
addition, collecting images from multiple flies (hundreds)
allows us to quantitate the level of natural variability in
segmentation and the experimental error in collecting this
data.
More and more laboratories (including those engaged in the Drosophila Genome Project) are presenting images of embryos from confocal scanning, for example, [3, 4] (see http://urchin.spbcas.ru/Mooshka/ and
http://www.fruitfly.org/). All workers in this area face image
Drosophila Gene Expression Image Processing
825
(a)
(a)
(b)
(b)
Figure 1: An example of an expression pattern image and its 3D

reconstruction for Drosophila. These images show the first indications of body segmentation in the embryo. (a) An image of a developing fruit-fly egg under light microscope. The egg is shaped like
a prolate ellipsoid. Dark dots are nuclei located just under the egg
surface. There are about 3000 nuclei in this image. The nuclei are
scanned to visualize the amount of one of the segmentation gene
products (even-skipped or eve) at each nucleus. The darker the nucleus, the greater the local concentration of eve. (b) A reconstructed
3D picture showing the arrangement of nuclei and visualizing the
eve pattern in a yellow-red-black palette.
processing challenges in reconstructing expression profiles

from the results of confocal microscopy.
In this paper, we review problems in the field of processing confocal images of Drosophila gene expression and
present our processing techniques based on genetic algorithms (GAs). We will discuss their ecacy in decreasing observational errors and visualizing natural variability in gene
expression patterns.
2.
PROBLEMS AND APPROACHES FOR INTEGRATING

DATA SETS FROM RAW IMAGES
Sources of variability in our images can be roughly subdivided into natural embryo variability in size and shape, natural expression pattern variability, errors of image processing
procedures, experimental errors (fixation, dyeing), observational errors (confocal scanning), and the molecular noise of
expression machinery.
Figure 2: Embryos of the same time class and the same length
have dierent expression patterns. Eve stripes dier in spacing and
overall domain along the anteroposterior (AP, x-) axis, and show
stripe curvature in the dorsoventral (DV, y-) direction.
final pattern is not dependent on embryo size (at least within

the limits of natural size variability). However, integration of
data from dierent flies requires size standardization.
Size variability was resolved by image preprocessing with
the Khoros package [5]. After a cropping procedure, each image was rescaled to the same length and width. Relative units
of percent egg length are used.
2.2.
Expression pattern variability
Even after cropping and rescaling, there is still variation in

the positioning and proportions of expression patterns for
the same gene at the same developmental stage (Figure 2).
To match two images such as Figures 2a and 2b (in order to make integrated datasets), we use 2D elastic deformations. We treat separately the dorsoventral (DV) curvature dierences and the AP spacing dierences [6]. First,
we perform a 2D elastic deformation to straighten segmentation stripes. This step minimizes the DV contribution to
the AP patterning, especially to AP variability. Next, on
a pairwise basis, we move (in 1D) the stripes into register along the AP axis, minimizing the variability in stripe
spacing and overall expression domain. These two steps
make for a tough optimization procedure, which is probably
best solved with modern heuristic approaches such as GAs
[6].
2.1. Size and shape

Early embryos of isogenic fruit flies can dier in length by
30%. Regardless of such dierences in size, expression patterns for segmentation remain qualitatively the same. This is
a classic case of scaling in biological pattern formation; the
2.3.
Scanning error
After the above processing, images still have variability in fluorescence intensity due to experimental conditions. With image processing, we can address experimental or observational
826

250
Fluorescence
200
Stripe
straightening
150
100
50
0
Registration
,%
axis
DV-
50
100
80
60
40
20
Anterior-posterior axis (in %)
Figure 3: An example of the systematic DV distortion of an expression surface, with the gene Kruppel.
errors which have a systematic character. Due to the ellipsoidal geometry of the egg, nuclei in the center of the image
(along the AP axis) are closer to the microscope objective and
look brighter than nuclei at the top and bottom of the image.
Intensity shows a DV dependence (Figure 3). The brightness
depends (roughly) quadratically on DV distance from the AP
midline. We flatten this DV bias by a procedure of expression
surface stretching.
Figure 4 summarizes the three steps of image processing
which follow the scaling: stripe straightening, stripe registration, and expression surface stretching. The details of the
processing techniques are in Section 3.
After image processing, we can generate an integrated
dataset and begin to address questions regarding the segmentation patterning dynamics. We are pursuing two problems initially. First, we are visualizing the maturation of the
expression patterns for all segmentation genes over the patterning period. Second, since we have removed many of the
sources of variability in the images, what remains should be
largely indicative of intrinsic, molecular scale fluctuations in
protein concentrations. We are comparing relative noise levels within the segmentation signaling hierarchy. These are
some of the first tests of theoretical predictions for noise
propagation in segmentation signaling [7, 8]. In general,
both of these approaches should provide tests of existing theories for segment patterning.
3.
METHODS
3.1. Confocal scanning of developing Drosophila eggs

Gene expression was measured using fluorescently-tagged
antibodies as described in [9]. For each embryo, a 1024
1024 pixel image with 8 bits of fluorescence data in each of 3
channels was obtained (Figure 5). To obtain the data in terms
of nuclear location, an image segmentation procedure was
applied [10].
Stretching
Figure 4: Steps for processing large sets of images to obtain an integrated dataset of segmentation pattern dynamics (a pair of images
used in this example). Stripe straightening minimizes the DV contribution to the AP patterning. Stripe registration minimizes the
variability in AP stripe positioning. Expression surface stretching
minimizes systematic observational errors in the DV direction.
The segmentation procedure transforms the image into

an ASCII table containing a series of data records, one for
each nucleus. (About 25003500 nuclei are described for
each image.) Each nucleus is characterized by a unique identification number, the x- and y-coordinates of its centroid,
and the average fluorescence levels of three gene products.
At present, over 1000 images have been scanned and processed. Our dataset contains data from embryos stained for
14 gene products. Each embryo was stained for eve (Figures
1 and 2) and two other genes.
Time classification
All embryos under study belong to cleavage cycle 14 [11].
This cycle is about an hour long and is characterized by a
rapid transition of the pair-rule gene expression patterns,
which culminates in the formation of 7 stripes. The embryos
were classified into eight time classes primarily by observation of the eve pattern. This classification was later verified
by observation of the other patterns and by membrane invagination data.
827
Complete registration is achieved by sequential application of the polynomial transformations (1) and (2) to pairs of
images. Complete registration within each time class relative
to a starting image (the time class exemplar) gives sets of images suitable for constructing integrated datasets. If we then
compare results across time classes, we are able to visualize
detailed pattern dynamics over cell cycle 14.
The starting images in each time class, the time class exemplars, were chosen using the following way: the distance
between each (stripe-straightened) image and every other
(stripe-straightened) image in a time class was calculated
using the registration cost function (see Section 3.3). These
costs were summed for each image and the image with the
lowest total cost was used as the starting image. All other images in the time class were registered to this image. The starting image was unaected by the registration transformation
[6].
We perform (fluorescence intensity) surface stretching to
decrease DV distortion using the following polynomial:
Z = Z + C1 Y + C2 Y 2 + C3 XY + C4 Y 3 + C5 XY 2 + C6 X 2 Y, (3)
Figure 5: An example of an embryo separately dyed and scanned

for three gene products.
3.2. Deformations by polynomial series

Our three main deformations introduced above (stripe
straightening, registration, and surface stretching) are based
on polynomial series. Due to the character of segmentation pattern variability, our deformations are reminiscent of
an earlier attempt by Thompson [12] to quantitatively describe the mechanism of shape change . Stripe straightening
looks quite similar to his famous image of a puer fish to
Mola mola fish transformation. This visually simple graphical technique was explicitly described by Bookstein [13, 14].
We have found that Drosophila segmentation patterns can
also be related by such simple transformation functions.
The stripe-straightening procedure is a transformation of
the AP, x-coordinate by the following polynomial:
x = Axy 2 + Bx2 y + Cxy 3 + Dx2 y 2 ,
(1)
where x = w w0 , y = h h0 , w and h are initial spatial coordinates, and w0 , h0 , A, B, C, and D are parameters.
The y-coordinate remains the same while the x-coordinate is
transformed as a function of both coordinates w and h (for
details, see [6, 15, 16]). The parameters w0 , h0 , A, B, C, and
D for each image are found by means of GAs.
Our pairwise image registration procedure is the next
step in the sequential transformation of the x-coordinate. We
use the following polynomial for x :
x = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + c5 x5 ,
(2)
where c0 , c1 , c2 , c3 , c4 , and c5 are parameters found by means

of GAs for each image (for details, see [6, 16]).
where Z is expression, X = w W 0 , Y = h H 0 , w and h

are initial spatial coordinates, and W 0 , H 0 , C0 , C1 , C2 , C3 , C4 ,
and C5 are parameters found by means of GAs. Note that W 0
and H 0 generally dier from w0 and h0 in expression (1).
The computing time for finding parameters by optimization techniques is comparable for the three polynomial
transformations (1), (2), and (3), though stripe straightening
(1) is the most time intensive [6, 15, 16].
3.3.
Optimization by GAs
We tested several techniques for optimization of (1) and (2):

GAs, simplex, and a hybrid of these [6, 16]. Fitting polynomial coecients is fairly routine and can be solved with any
GA library. All we need is to define cost functions for our
three particular tasks.
We used a standard GA approach in a classic evolutionary strategy (ES). ES was developed by Rechenberg [17] and
Schwefel [18] for computer solution of optimization problems. ES algorithms consider the individual as the object
to be optimized. The character data of the individual is the
parameters to be optimized in an evolutionary-based process. These parameters are arranged as vectors of real numbers for which operations of crossover and mutation are
defined.
In GA, the program operates on a population of floatingpoint chromosomes. At each step, the program evaluates
every chromosome according to a cost function (below).
Then, according to a truncation strategy, an average score
is calculated. Copies of chromosomes with scores exceeding the average replace all chromosomes with scores less
than average. After this, a predetermined proportion of
the chromosome population undergoes mutation in which
one of the coecients gets a small increment. This whole
cycle is repeated until a desired level of optimization is
achieved.
D-V axis
828

UNIX, and in Borland and DEC Pascal. Details of the EO0.8.5 C++ library implementation have been published [6,
16].
90
80
70
60
50
40
30
20
10
4.
A-P axis
Figure 6: Scheme of image stripping for cost function calculation.
3.3.1 Cost function for stripe straightening

The following procedure evaluates chromosomes during the
GA calculation for stripe straightening. Each image was subdivided into a series of longitudinal strips (Figure 6). Each
strip is subdivided into bins, and a mean brightness (local
fluorescence level) is calculated for each bin. Each row of
means gives a profile of local brightness along each strip.
The cost function is computed by pairwise comparison of
all profiles and summing the squares of dierences between
the strips. The task of the stripe-straightening procedure is to
minimize this cost function.
EFFICACY OF IMAGE PROCESSING
As discussed in the introduction, fluorescence intensity measurements demonstrate high variability and are subject to diverse observational and experimental errors. Our aim with
the image processing is to decrease some of the observational
and experimental errors and help distinguish these from the
natural variability which we would like to study (i.e., characterization of the stochastic nature of molecular processes in
this gene network). We will discuss the ecacy of the image
processing by comparison of initial and residual variability in
our data.
4.1.
Stripe straightening and registration
Both functions were applied to a row of expression levels

at each nucleus (Z), ranked according to DV position (ycoordinate) while the x-coordinate was ignored. Argument
Z j is a given nucleus fluorescence level and Z j+1 and Z j 1 are
fluorescence levels for its two nearest (DV) neighbors. Our
tests show that F1 is better for our purposes.
With transformations (1) and (2), we aim at as good a match

as possible (by heuristic optimizations) between the data
within a time class. Figure 7a shows a superposition of about
hundred eve expression surfaces after stripe straightening
and registration. (The intensity data is discrete at nuclear resolution but we display some of our results as continuously
interpolated expression surfaces.)
Embryo-to-embryo variability of the expression pattern
for the first ten zygotic segmentation genes we are studying is
similar to that for eve. Because of the two-dimensionality of
the expression surface and the irregularity of nuclear distribution, quantitative comparison of this variability is a tough
biometric task.
One way to simplify the problem is to compare representative cross-sections through the expression surface along
the midline of an embryo in the AP direction (e.g., Figure 6,
center strip). For all nuclei with centroids located between
50% and 60% embryo width (DV position), expression levels were extracted and ranked by AP coordinate. This array of
250350 nuclei gives an AP transect through the expression
surface [19].
Using these transects, we can measure the eect on
embryo-to-embryo variability of our processing steps.
Figure 7b shows the variability after rescaling and stripe
straightening (before complete registration) for about a
hundred eve expression profiles from the 8th time class
(Figure 7c). Intensity means at each AP position are shown
with error bars (standard deviation). Minimizing stripe spacing variability, by registration, reduces the error bars significantly (Figures 7d and 7e). In addition to molecular-level
fluctuations in gene expression, one of the remaining sources
of error in Figures 7d and 7e may be experimental variability in intensity (from fixing and dying procedures, as well
as variability in microscope scanning), estimated at 1015%
of the 0255 intensity scale. Normalization of this variability
may require both image processing and empirical solutions.
3.3.4 Implementation
4.2.
GA-based programs for our three tasks were implemented

both in EO-0.8.5 C++ library [4] for DOS/Windows and
The true expression of eve in early cycle 14 is uniform.

Due to systematic distortions in intensity data, however, the
3.3.2 Cost function for registration

To evaluate the similarity of a registering image to the reference image (time class exemplar), we use an approach similar to the previous one. We take longitudinal strips from
the midlines of the registering and reference images (e.g.,
Figure 6, centre strip). The strips are subdivided into bins
and mean brightness calculated for each bin. Each row of
means gives the local brightness profile along each embryo.
The cost function is computed by comparing the profiles and
summing the squares of dierences between them. Registration proceeds until this cost is minimized.
3.3.3 Cost function for surface stretching
To minimize distortion of the (fluorescence intensity) expression surface along the DV direction (y-coordinate), we
tested two cost functions based on discrete approximations
of first- and second-order derivatives in y:
F1 =
F2 =

Z j Z j+1

2
2Z j Z j+1 Z j 1
2
(4)
.
Expression surface stretching
829
250
Fluorescence
250
200
150
100
50
0
150
100
p osi
tion
,%
EL
65
60
55
50
45
40
35
30
50
D-V
Fluorescence
200
30
40
50
60
70
80
A-P position in % of egg length
30
90
40
50
60
70
AP position (% egg length)
(a)
80
90
(b)
250
Fluorescence
200
Fluorescence
300
250
200
150
100
50
150
100
50
0
50
11
21
31
41
51
61
71
81
0
30
91
40
50
60
70
(c)
80
(d)
300
Fluorescence
250
200
150
100
50
0
50
21
41
61
81

(e)
Figure 7: Superposition of about a hundred images for eve gene expression from time class 8 (late cycle 14). (a) Superposition of all
eve expression surfaces after the stripe straightening and registration. (b) Variability of expression profiles for gene eve after the stripestraightening procedure. (c) Mean intensity at each AP position, with standard deviation error bars for the expression profiles from (b). (d)
Residual variability for the same dataset after stripe straightening and registration. (e) Mean intensity with standard deviation error bars for
the expression profiles from (d). These have decreased significantly with stripe registration. Data for the 1D profiles is extracted from 10%
(DV) longitudinal strips (e.g., Figure 6, center strip). Cubic spline interpolation was used to display discrete data.
expression surface for such an embryo looks like a half ellipsoid (Figures 8a and 8b). The fluorescence level at the edges
of the image is about 20 arbitrary units, while in the center it
is about 60 units. (The expression surface follows the geometry of the embryo as illustrated in Figure 1b.) Even in eve null
mutants, background fluorescence shows this distortion.
830

100
100
80
80
60
60
40 80
60
40
20
40 80
60
40
20
(X, Y, Z)
20
40
60
80
(X, Y, Z)
20
40
(a)
60
40
40
20
20
20
40
80
(b)
60
0
40 0
60
80
(X, Y, Z)
60
60
80
(c)
0
40 0
60
80
(X, Y, Z)
20
40
60
80
(d)
Figure 8: Surface stretching transformation. (a) and (b) Experimental expression surface and scatter plot, for a truly uniform distribution
of the eve gene product. (c) and (d) Expression surface and scatter plot after surface stretching, minimizing the systematic errors in intensity
data.
The stretching procedure transforms the expression surface along the DV, y-axis (Figures 8c and 8d). Minimizing
the systematic observational error in this direction gives us a
chance to directly observe nucleus-to-nucleus variability in a
single embryo (Figure 8c).
5.
RESULTS AND DISCUSSION
We have found heuristic optimization procedures (transformations (1), (2), and (3)) to be a simple and eective way to
reduce observational errors in embryo images. This reduction of variability allows us to focus on the variability intrin-
sic to gene expression and the dynamics of patterning over

cycle 14. Here, we give an overview of some of our results
with processed datasets.
5.1. Integrated dataset
As mentioned in the introduction, dataset integration from
multiple scanned embryos is necessary due to the impossibility of simultaneously staining embryos for all segmentation genes at once (the current limit is triple staining). Other
work [19, 20] have begun to address the processing necessary to standardize images for dataset integration. Myasnikova et al. [19] have used transects, as in Figures 7b and
7c, and have done stripe registration of the profiles (with
831
gt
200
150
100
60
50
30
40
50
60
A-P position
70
80
20
90
D-V
p osit
50
40
30
ion
Fluorescence
250
hb
Figure 9: Part of an integrated dataset of gene expression in time

class 8 (late cycle 14) for the gap genes hunchback (hb), giant (gt),
Kruppel, and knirps(kni) and the pair-rule gene eve. Each surface
is the gene expression for a time class exemplar (as discussed in
Section 3).
a dierent method than ours). Our work adds the steps

of stripe straightening and surface stretching, allowing for
the construction of 2D expression surfaces and integrated
datasets (Figure 9). These steps also minimize contributions
to AP variability from DV sources, clarifying the task of
studying molecular sources of intensity variability.
More such processed segmentation patterns are posted
and updated on the website HOX Pro (http://www.iephb.
nw.ru/hoxpro, [21]) and the web-resource DroAtlas (http://
www.iephb.nw.ru/spirov/atlas/atlas.html).
5.2. Dynamics of profile maturation
Any analysis of the formation of gene expression patterns
must address the striking dynamics over cycle 14. Especially
in early cycle 14, these patterns are quite transient, only settling down around mid-cycle 14 to the segmentation pattern.
Comparative analysis of pattern dynamics for the pair-rule
genes is particularly important. Essential questions on the
mechanisms underlying these striped patterns are still open
[22, 23].
The only way to trace the patterning in sucient detail
to address these questions is to integrate large sets of embryo images over these developmental stages. (Time ranking within cycle 14 is not a simple task. Presently, it takes an
expert to rank images into time classes. We are developing
automated software for ranking, to be published elsewhere.)
AP profiles which have been registered can be integrated into
composite pictures like Figure 10, which plots AP distance
horizontally against time (at the 8 time class resolution) vertically, with intensity in the outward direction.
Figure 10 allows us to examine a number of features of
cycle 14 expression dynamics. Gap genes tend to establish
sharp spatial boundaries earlier than the pair-rule genes.
Pair-rule genes are initially expressed in broad domains,
which later partition into seven stripes. The regularity of the
kni
123 4 5 6 7
eve
1 2 3 4 5 6 7
hairy
Figure 10: Three-dimensional diagrams representing dynamics of

AP profiles of expression for the gap genes gt, hb, kni, and pairrule genes eve and hairy (h). Horizontal coordinate is spatial AP
axis (from left to right); vertical coordinate is time axis (from up
to down); expression axis is perpendicular to the plane of the diagrams. White numbers marks individual stripes of eve and hairy.
late cycle pattern is well covered in the literature, but the details of the early dynamics are not so well characterized.
All five genes show a movement towards the middle of
the embryo, with anterior expression domains moving posteriorly and posterior domains moving anteriorly. In more
detail, the small anterior domain of knirps (white arrowhead)
appears to move posteriorly at the same speed as eve stripe 1
(also marked by white arrowhead). It appears that we can see
interactions between hb and gt in the posterior: a posterior
gt peak forms first, but as posterior hb forms, the gt peak
moves anteriorly. This interaction appears to be reflected in
the movement of stripe 7 of eve and h (black arrowheads).
We hope that further study of the correlation between expression domains over cycle 14 and observation of the fine
gene-specific details of domain dynamics will serve to test
theories of pattern formation in Drosophila segmentation.
832

250
Fluorescence
200
150
100
50
0
0
20
40
60
80
100
(a)
250
Fluorescence
200
stage, we can see that overall noise is comparable between

the genes, but the anterior edge of the eve stripe is relatively
well controlled. Figure 11b shows means and standard deviations at each AP position. We are using this type of data to
address how noise is propagated and filtered in the segmentation network (to appear elsewhere).
To conclude, we have applied image processing steps to
minimize particular sources of experimental and observational error in the scanned images of segmentation gene expression. Cropping and scaling addresses embryo size variability. Stripe straightening eliminates variable DV contributions to the AP pattern. Registration minimizes dierences in
expression domains and spacing for pair-rule genes. Expression surface stretching minimizes systematic observational
error along the y-axis. The combination of these procedures
allows us to create composite 2D expression surfaces for the
segmentation genes, allowing us to investigate pattern dynamics over cycle 14. Also, these procedures allow us to do
single-embryo statistics, eliminating many sources of experimental variability in order to address molecular-level noise
in the genetic machinery.
150
ACKNOWLEDGMENT
The work of AS is supported by USA National Institutes of
Health, Grant RO1-RR07801, INTAS Grant 97-30950, and
RFBR Grant 00-04-48515.
100
50
0
20
40
60
80
100
(b)
Figure 11: Eve and bcd fluorescence scatterplots and profiles (early
cycle 14, time class 1), sampled from a 50% DV longitudinal strip.
(a) Scatterplots after stripe straightening and surface stretching.
Each dot is the intensity for a single nucleus. (b) Curves of mean
intensity at each AP position, with standard deviation error bars.
5.3. Nucleus-to-nucleus variability

Pictures like Figure 7c give us glimpses into the molecularlevel fluctuations existing in this gene network. However,
such data still displays variability in scanning between embryos and over time with the experimental procedure.
With stripe straightening and surface stretching, we have a
chance to look at nucleus-to-nucleus variability in single embryos, eliminating many sources of experimental error. (The
drawback is that we are limited to triple-stained embryos.)
Figure 11a shows the maternal protein bicoid (bcd) (exponential) and expression of eve (single peak, the future eve
stripe 1) for a single embryo in early cycle 14. This image was
made from a 50% DV longitudinal strip so that the observed
variation at any AP position is that in the DV direction (e.g.,
along a stripe). Each dot is the intensity for a single nucleus.
The variation in this plot is largely due to natural, molecularlevel fluctuations in gene expression. At this developmental
REFERENCES
[1] M. Akam, The molecular basis for metameric pattern in the
Drosophila embryo, Development, vol. 101, no. 1, pp. 122,
1987.
[2] P. A. Lawrence, The Making of a Fly, Blackwell Scientific Publications, Oxford, UK, 1992.
[3] B. Houchmandzadeh, E. Wieschaus, and E. Leibler, Establishment of developmental precision and proportions in the
early Drosophila embryo, Nature, vol. 415, no. 6873, pp. 798
802, 2002.
[4] M. Keijzer, J. J. Merelo, G. Romero, and M. Schoenauer,
Evolving objects: a general purpose evolutionary computation library, in Proc. 5th Conference on Artificial Evolution
(EA-2001), P. Collet, C. Fonlupt, J.-K. Hao, E. Lutton, and
M. Schoenauer, Eds., number 2310 in Springer-Verlag Lecture
Notes in Computer Science, pp. 231244, Springer-Verlag, Le
Creusot, France, 2001.
[5] J. Rasure and M. Young, An open environment for image
processing software development, in Proceedings of 1992
SPIE/IS&T Symposium on Electronic Imaging, vol. 1659 of
SPIE Proceedings, pp. 300310, San Jose, Calif, USA, February 1992.
[6] A. V. Spirov, A. B. Kazansky, D. L. Timakin, J. Reinitz,
and D. Kosman, Reconstruction of the dynamics of the
Drosophila genes from sets of images sharing a common pattern, Journal of Real-Time Imaging, vol. 8, pp. 507518, 2002.
[7] D. Holloway, J. Reinitz, A. V. Spirov, and C. E. VanarioAlonso, Sharp borders from fuzzy gradients, Trends in Genetics, vol. 18, no. 8, pp. 385387, 2002.
[8] T. C. Lacalli and L. G. Harrison, From gradients to segments:
models for pattern formation in early Drosophila embryogenesis, Semin. Dev. Biol., vol. 2, pp. 107117, 1991.

[9] D. Kosman, S. Small, and J. Reinitz, Rapid preparation of
a panel of polyclonal antibodies to Drosophila segmentation
proteins, Development Genes and Evolution, vol. 5, no. 208,
pp. 290294, 1998.
[10] D. Kosman, J. Reinitz, and D. H. Sharp, Automated assay of
gene expression at cellular resolution, in Proc. Pacific Symposium on Biocomputing (PSB 98), R. Altman, K. Dunker,
L. Hunter, and T. Klein, Eds., pp. 617, World Scientific Press,
Singapore, 1998.
[11] V. A. Foe and B. M. Alberts, Studies of nuclear and cytoplasmic behaviour during the five mitotic cycles that precede
gastrulation in Drosophila embryogenesis, Journal of Cell Science, vol. 61, pp. 3170, 1983.
[12] D. W. Thompson, On Growth and Form, Cambridge University Press, Cambridge, UK, 1917.
[13] F. L. Bookstein, When one form is between two others: an
application of biorthogonal analysis, American Zoologist, vol.
20, pp. 627641, 1980.
[14] F. L. Bookstein, Morphometric Tools for Landmark Data: Geometry and Biology, Cambridge University Press, Cambridge,
UK, 1991.
[15] A. V. Spirov, D. L. Timakin, J. Reinitz, and D. Kosman, Experimental determination of Drosophila embryonic coordinates
by genetic algorithms, the simplex method, and their hybrid,
in Proc. 2nd European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP 00),
S. Cagnoni and R. Poli, Eds., number 1803 in Springer-Verlag
Lecture Notes in Computer Science, pp. 97106, SpringerVerlag, Edinburgh, Scotland, UK, April 2000.
[16] A. V. Spirov, D. L. Timakin, J. Reinitz, and D. Kosman, Using
of evolutionary computations in image processing for quantitative atlas of Drosophila genes expression, in Proc. 3rd European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP 01), E. J. W. Boers, J. Gottlieb, P. L. Lanzi, et al., Eds., number 2037 in Springer-Verlag
Lecture Notes in Computer Science, pp. 374383, SpringerVerlag, Lake Como, Milan, Italy, April 2001.
[17] I. Rechenberg,
Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution,
Frommann-Holzboog, Stuttgart, Germany, 1973.
[18] H.-P. Schwefel, Numerical Optimization of Computer Models,
John Wiley & Sons, Chichester, UK, 1981.
[19] E. M. Myasnikova, A. A. Samsonova, K. N. Kozlov, M. G. Samsonova, and J. Reinitz, Registration of the expression patterns of Drosophila segmentation genes by two independent
methods, Bioinformatics, vol. 17, no. 1, pp. 312, 2001.
[20] K. Kozlov, E. Myasnikova, A. Pisarev, M. Samsonova, and
J. Reinitz, A method for two-dimensional registration and
construction of the two-dimensional atlas of gene expression
patterns in situ, Silico Biology, vol. 2, no. 2, pp. 125141,
2002.
[21] A. V. Spirov, M. Borovsky, and O. A. Spirova, HOX Pro DB:
the functional genomics of hox ensembles, Nucleic Acids Research, vol. 30, no. 1, pp. 351353, 2002.
[22] J. Reinitz, E. Mjolsness, and D. H. Sharp, Model for cooperative control of positional information in Drosophila by bicoid
and maternal hunchback, The Journal of Experimental Zoology, vol. 271, no. 1, pp. 4756, 1995.
[23] J. Reinitz and D. H. Sharp, Mechanism of eve stripe formation, Mechanisms of Development, vol. 49, no. 1-2, pp. 133
158, 1995.
833
Alexander Spirov is an Adjunct Associate
Professor in the Department of Applied
Mathematics and Statistics and the Center for Developmental Genetics at the State
University of New York at Stony Brook,
Stony Brook, New York. Dr. Spirov was born
in St. Petersburg, Russia. He received M.S.
degree in molecular biology in 1978 from
the St. Petersburg State University, St. Petersburg, Russia. He received his Ph.D. in
the area of biometrics in 1987 from the Irkutsk State University,
Irkutsk, Russia. His research interests are in computational biology and bioinformatics, web databases, data mining, artificial intelligence, evolutionary computations, animates, artificial life, and
evolutionary biology. He has published about 80 publications in
these areas.
David M. Holloway is an instructor of
mathematics at the British Columbia Institute of Technology and a Research Associate
in chemistry at the University of British
Columbia, Vancouver, Canada. His research
is focused on the formation of spatial pattern in developmental biology (embryology) in animals and plants. Topics include
the establishment and maintenance of differentiation states, coupling between chemical pattern and tissue growth for the generation of shape, and the
eects of molecular noise on spatial precision. This work is chiefly
computational (the solution of partial dierential equation models
for developmental phenomena), but also includes data analysis for
body segmentation in the fruit fly. He received his Ph.D. in physical
chemistry from the University of British Columbia in 1995, and did
postdoctoral fellowships there and at the University of Copenhagen
and Simon Fraser University.

A Comparison of Evolutionary Algorithms for Tracking

Time-Varying Recursive Systems
Michael S. White
Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, UK
Email: mike@whitem.com
Stuart J. Flockton
Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, UK
Email: s.flockton@rhul.ac.uk
A comparison is made of the behaviour of some evolutionary algorithms in time-varying adaptive recursive filter systems. Simulations show that an algorithm including random immigrants outperforms a more conventional algorithm using the breeder
genetic algorithm as the mutation operator when the time variation is discontinuous, but neither algorithms performs well when
the time variation is rapid but smooth. To meet this deficit, a new hybrid algorithm which uses a hill climber as an additional
genetic operator, applied for several steps at each generation, is introduced. A comparison is made of the eect of applying the
hill climbing operator a few times to all members of the population or a larger number of times solely to the best individual; it is
found that applying to the whole population yields the better results, substantially improved compared with those obtained using
earlier methods.
Keywords and phrases: recursive filters, evolutionary algorithms, tracking.
1.
INTRODUCTION
Many problems in signal processing may be viewed as system identification. A block diagram of a typical system identification configuration is shown in Figure 1. The information available to the user is typically the input and the noisecorrupted output signals, x(n) and a(n), respectively, and
the aim is to identify the properties of the unknown system by, for example, putting an adaptive filter of a suitable
structure in parallel to the unknown system and altering the
parameters of this filter to minimise the error signal (n).
When the nature of the unknown system requires pole-zero
modelling, there is a diculty in adjusting the parameters
of the adaptive filter, as the mean square error (MSE) is a
nonquadratic function of the recursive filter coecients, so
the error surface of such a filter may have local minima as
well as the global minimum that is being sought. The ability
of evolutionary algorithms (EAs) to find global minima of
multimodal functions has led to their application in this area
[1, 2, 3, 4].
All these authors have considered only time-invariant
unknown systems. However in many real-life applications,
time variations are an ever-present feature. In noise or echo
cancellation, for example, the unknown system represents
the path between the primary and reference microphones.

Movements inside or outside of the recording environment
cause the characteristics of this filter to change with time.
The system to be identified in an HF transmission system
corresponds to the varying propagation path through the atmosphere. Hence there is an interest in investigating the applicability of evolutionary-based adaptive system identification algorithms to tracking time-varying recursive systems.
Previous work on the use of EAs in time-varying systems
has been published in [5, 6, 7, 8, 9] but none of these deal
with system identification of recursive systems. After explaining our choice of filter structure in Section 3, we go on in
Section 4 to compare the performance of the EA introduced
in [4] with that of the algorithm in [7]. We show that while
both can cope reasonably well with slow variations in the system parameters, the approach of [7] is more successful in the
case of discontinuous changes, but neither copes well where
the variation is smooth but fairly rapid (the distinction between slow and rapid variation is explained quantitatively
in Section 3.1). In Section 5, we propose a new hybrid algorithm which embeds what is in eect a hill-climbing operator within the EA and show that this new algorithm is much
more successful for the dicult problem of tracking rapid
variations.
Tracking Time-Varying Recursive Systems
835
Noise, w(n)
Unknown system
H(z)
x(n)
y(n) +
+
a(n)
Adaptive filter
H(z)
y (n)
Error, (n)
2
N 1
z1

2
N 1
N
Input, x(n)
z1
N 1
z1
2
z1
1
0
y(n)
Figure 2: Pole-zero lattice filter.

Figure 1: System identification.
2.
GENETIC ALGORITHMS IN CHANGING

ENVIRONMENTS
The standard genetic algorithm (GA), with its strong selection policy and low rate of mutation, quickly eliminates diversity from the population as it proceeds. In typical function
optimization applications, where the environment remains
static, we are not usually concerned with the population diversity at later stages of the search, so long as the best or mean
value of the population fitness is somewhere near to an acceptable value. However, when the function to be optimized
is nonstationary, the standard GA runs into considerable
problems once the population has substantially converged
on a particular region of the search space. At this point, the
GA is eectively reliant on the small number of random mutations, occurring each generation, to somehow redirect its
search to regions of higher fitness since standard crossover
operators are ineective when the population has become
largely homogeneous. This view is borne out by Pettits and
Swiggers study [10] in which a Holland-type GA was compared to cognitive (statistical predictive) and random pointmutation models in a stochastically fluctuating environment.
In all cases, the GA performed poorly in tracking the changing environment even when the rate of fluctuation was slow.
An approach to providing EAs capable of functioning well in
time-varying systems is the mutation-based strategy adopted
by Cobb and Grefenstette [5, 6, 7]. In this approach, population diversity is sustained either by replacing a proportion of
the standard GAs population with randomly generated individuals, the random immigrants strategy, or by increasing
the mutation rate when the performance of the GA degrades
(triggered hypermutation). Cobbs hypermutation operator is
adaptive, briefly increasing the mutation rate when it detects
that a degradation of performance (measured as a running
average of the best performing population members over five
generations) has occurred. However, it is easy to contrive categories of environmental change which would not trigger the
hypermutable state. On continuously changing functions,
the hypermutation GA has a greater variance in its tracking
performance than either the standard or random immigrants
GA. In oscillating environments, where the changes are more
drastic, the high mutation level of the hypermutation GA
destroys much of the information contained in the current
population. Consequently, when the environment returns to

its prior state, the GA has to locate the previous optimum
from scratch.
3.
CHOICE OF RECURSIVE FILTER STRUCTURE
One of the main diculties encountered in recursive adaptive systems is the fact that the system can become unstable
if the coecients are unconstrained. With many filter structures, it is not immediately obvious whether any particular
set of coecients will result in the presence of a pole outside the unit circle, and hence instability. On the other hand,
it is important that the adaptive algorithm is able to cover
the entire stable coecient space, so it is desirable to adopt
a structure which will make this possible at the same time as
making stability monitoring easy. It is for this reason that the
pole-zero lattice filter [11] was adopted for this work. A block
diagram of the filter structure is given in Figure 2.
The input-output relation of the filter is given by
y(n) =
N
i (n)Bi (n),
(1)
i=0
where Fi (n) and Bi (n) are the forward and backward residuals denoted by
Bi (n) = Bi1 (n) + i (n)Fi (n),
i = 1, 2, . . . , N,
Fi (n) = Fi+1 (n) i+1 (n)Bi (n 1),
i = N, . . . , 1,
FN (n) = x(N).
(2)
It can be shown that a necessary and sucient condition for all of the roots of the pole polynomial to lie within
the unit circle is |ki | < 1, i = 1, . . . , N, so the stability of
candidate models can be guaranteed merely by restricting
the range over which the feedback coecients are allowed
to vary. Since this must be done when implementing the GA
anyway, the ability to maintain filter stability is essentially obtained without cost.
3.1.
Quantifying time variations in the system

being tracked
Work on the tracking performance of LMS, detailed in [12],

employs the concept of the nonstationarity degree to embody
the notions of both the size and speed of time variations. The
836
nonstationarity degree d(n) is defined as

d(n) =

E t(n)2

min (n)
(3)
where t(n) is the output noise caused by the time variations

in the unknown system and min (n) is the output noise power
in the absence of time variations in the system.
Having devised a metric incorporating both the speed
and size of time variations, Macchi [12] goes on to describe
three distinct classes of nonstationarity. Slow variations are
those in which the nonstationarity degree is much less than
one, that is, the variation noise is masked by the measurement noise. For the LMS adaptive filter, slow changes to the
plant impulse response are seen to be easy to track since
the time variations need not be estimated very accurately.
This class of time variations is further subdivided into two
groups in which the unknown filter coecients undergo
deterministic or random evolution patterns. Rapid variations (d(n) permanently greater than one), however, present
a much greater problem to LMS and LS adaptive filters. In
the case of time-varying line enhancement at low signal-tonoise ratio, where the frequency of the sinusoidal input signal
is chirped, Macchi et al. state that . . . slow adaptation/slow
variation condition implies an upper limit for the chirp rate
. This limit is the level above which the misadjustment is
larger than the original additive noise. The noisy signal is
thus a better estimate of the sinusoid than the adaptive system output. The slow adaptation condition is therefore required, in practice, to implement the adaptive system [13,
page 360].
In the case of LMS adaptive and inverse adaptive modelling, adaptive filters cannot track time variations which
are so rapid that d(n) is permanently greater than one. Indeed within a single iteration, the algorithm cannot acquire
the new optimal filter H(n+1),

starting from H(n)
[12, page
298].
As a consequence, only a special subset of rapid time
variations is generally considered in the context of LMS filter adaptation. The jump class of nonstationarity produces
scarce large changes in the unknown filter impulse-response.
Hence the definition of jump variations is variations where
occasionally
d(n) 1,
(4)
d(n) 1.
(5)
but otherwise,
In this case occasionally is defined as a period of time long

enough for the algorithm to achieve the steady-state where
the error is approximately equal to the additive noise.
4.
RANDOM IMMIGRANTS AND BGA-TYPE

ALGORITHMS
In this section, the performance of two genetic adaptive algorithms operating in a variety of nonstationary environments
is investigated. The first algorithm is the modified genetic

adaptive algorithm described in [4]. The lattice coecients
are encoded as floating-point numbers and the mutation operator used is that from the breeder genetic algorithm (BGA)
described in [14]. This scheme randomly chooses, with probability 1/32, one of the 32 points (215 A, 214 A, . . . , 20 A),
where A defines the mutation range and is, in these simulations, set to 0.1 coecient range. The crossover operator involved selecting two parent filter structures at random
and generating identical copies. Two cut points were randomly selected and coecients lying between these limits
were swapped between the ospring. The newly generated
lattice filters were then inserted into the population replacing the two parent structures.
A measure of fitness of the new filter was obtained by
calculating the MSE for a block of current input and output
data. A block length of 10 input-output pairs was used for the
experiments reported below on a slowly varying system while
a length of 5 input-output pairs was used for the rapidly varying system. Fitness scaling was used, as described in Goldberg [15, page 77], and fitness proportional selection was
implemented using Bakers stochastic universal sampling algorithm [16]. Elitism was used to preserve the best performing individual from each generation. Crossover and mutation
rates were set to 0.1 and 0.6, respectively, and the population
contained 400 models. It was hoped that the use of the BGA
mutation scheme would give this algorithm a greater ability
to follow system changes than that of a GA using a more conventional mutation scheme, as the BGA algorithm retains,
even when the population has comparatively converged, significant probability of making substantial changes in the coecients if the system that it is modelling is found to have
changed.
In competition with this genetic optimizer, the random
immigrants mechanism of Cobb and Grefenstette, discussed
above, was placed. For this set of simulation experiments,
20% of the population was replaced by randomly generated
individuals every 10 generations. The same controlling parameters were used for both GAs.
4.1.
The test systems
Deterministically varying environments were produced by

making nonrandom alterations to the coecients of a sixthorder all-pole lattice filter. In the case of slow and rapid time
variations, the lattice coecients were varied in a sinusoidal
or cosinusoidal fashion taking in the full extent of the coecient range (1). Changes to the plant coecients were
eected at every sample instant with the precise magnitude
of these variations reflected in the value of d for each environment. With measurement noise suitably scaled to give a
signal-to-noise ratio of approximately 40 dB, the nonstationarity degrees of the slow and rapidly varying systems are 0.03
and 1.6, respectively.
Traditional (nonevolutionary) adaptive algorithms can
run into problems when called upon to track rapid time
variations (d permanently greater than one). When these
changes occur infrequently, however, the well-documented
837
10
2.0
1.0
a1
1.0
10
NMSE (dB)
0.0
2.0
20
200
400
600
800
1000
200
400
600
800
1000
200
400
600
Generations
800
1000
2.0
1.0
a2
30
0.0
1.0
40
200
400
600
Generations
800
1.0
1000
0.0
a3
50
2.0
Standard GA
Random immigrants GA
2.0
3.0
Figure 3: Performance of the genetic adaptive algorithm in a

rapidly varying environment (d = 1.6).
transient behaviour of the adaptive algorithm can be used

to describe the time to convergence and excess MSE that results. In order to investigate the performance of the genetic
adaptive algorithm under such conditions, an environment
was constructed in which the time variations of the plant
coecients are occasional and are often large in magnitude.
The system to be modelled was once again a sixth-order allpole filter. The infrequent time variations were introduced
by periodically negating one of the plant lattice coecients.
As a consequence, for much of the simulation, the unknown
system is time invariant (d = 0) with the nonstationarity degree greater than zero only during the occasional step
changes.
4.2. Results
The performance of the BGA-based algorithm and random
immigrants GA was evaluated in each of the three timevarying environments detailed. In each case, fifty GA runs
were performed using the same environment (time-varying
system).
In both the slowly changing and the jump environments,
the behaviour was more or less as expected. In the slowly
changing environment, both algorithms were able to reduce
the error to near the 40 dB noise floor (set by the level of
noise added to the system) and inspection of the parameters
shows them to be following the changes in the system well.
In the case of the step changes, the random immigrants algorithm exhibited better behaviour, recovering more quickly
when the system changed. The tracking of rapid changes
however is more dicult than either of these, and hence of
more interest, and in this neither of the algorithms are particularly successful. The error reduction performance of the
two adaptive algorithms is illustrated in Figure 3. In addi-
1.0
Standard GA
Random immigrants GA
True value of the coecient
Figure 4: Genetic adaptive algorithm tracking performance in a

tion to rapid small-scale excursions resulting from the use of

blocked input-output data, the extent to which the unknown
system is correctly identified fluctuates on a more macroscopic scale. The normalised mean square error (NMSE)
varies between the theoretical minimum of 40 dB and a
maximum of around 8 dB, eventual settling down to a
mean of around 20 dB.
These phenomena can be explained when one looks at a
graph of the coecient tracking performance (Figure 4). The
graph shows the time evolutions of the first three direct-form
coecients of the plant (represented by a dotted line) and the
best adaptive filter in the population. The coecients generated by the standard floating point GA are depicted by a gray
line whilst those produced by the random immigrants GA
are represented by a black line. Neither the standard floatingpoint GA nor the random immigrants GA were able to track
the rapid variations in the plant coecients throughout the
entire run. The periods when the best adaptive filter coefficient values diered significantly from the optimal values
correspond, in both cases, to the times when the identification was poor (see Figure 3).
5.
HYBRID GENETIC ALGORITHMS
Clearly, an algorithm which would be better able to track

rapid changes system parameters would be useful. A possible
method is to devise a hybrid algorithm combining the global
properties of the GA with a local search method to follow
838
the local variations in the parameters. In this way, the two
major failings of the individual components of the hybrid
can be addressed. The GA is often capable of finding reasonable solutions to quite dicult problems but its characteristic slow finishing is legendary. Conversely, the huge array of
gradient-based and gradientless local search techniques run
the risk of becoming hopelessly entangled in local optima. In
combining these two methodologies, the hybrid GA has been
shown to produce improvements in performance over the
constituent search techniques in certain problem domains
[17, 18, 19, 20].
Goldberg [15, page 202] discusses a number of ways in
which local search and GAs may be hybridized. In one configuration, the hybrid is described in terms of a batch scheme.
The GA is run long enough for the population to become
largely homogeneous. At this point, the local optimization
procedure takes over and continues the search, from perhaps the best 5 or 10% of solutions in the population, until improvement is no longer possible. This method allows
the GA to determine the gross features of the solution space,
hopefully resulting in convergence to the basin of attraction
around the global optimum, before switching to a technique
better suited to fine tuning of the solutions. An alternative
approach is to embed the local search within the framework
of the GA, treating it rather like another genetic operator.
This is the scheme adopted by Kido et al. [18] (who combine GA, simulated annealing, and TABU search), Bersini
and Renders [20] (whose GA incorporates a hill-climbing
operator), and Miller et al. [19] (who employ a variety of
problem-specific local improvement operators). This second
hybrid configuration is better suited to the identification of
time-varying systems. In this case, the local search heuristic
is embedded within the framework of the EA and is treated
as another genetic operator. The local optimization scheme is
enabled for a certain number of iterations at regular intervals
in the GA run.
The hybrid approach utilizes a random hill-climbing
technique to perform periodic local optimization. This procedure is ideally suited to incorporation in the EA since it
does not require calculation of gradients or any other auxiliary information. Instead, the same evaluation function
can be employed to determine the merit of the newly sampled points in the coecient space. Since the technique is
greedy, the locally optimized solution is always at least as
good as its genetic predecessor. In addition, once a change
in the unknown system has occurred and is detected by a
degradation of the models performance, no new data samples are required. The hill-climbing method incorporated
here into the GA is the random search technique proposed
by Solis and Wets [21]. This algorithm randomly generates a new search point from a uniform distribution centred about the current coecient set. The standard deviation of the distribution k is expanded or contracted in
relation to the success of the algorithm in locating better
performing models. If the first-chosen new point is not an
improvement on the original point, the algorithm tests another point the same distance away in exactly the opposite
direction.

In detail, the structure of the algorithm as used here is as
follows. Firstly, the parameter k is updated, being increased
by a factor of 2 if the previous 5 iterations have all yielded
improved fitness, decreased by a factor of 2 if the previous
3 iterations have all failed to find an improved fitness, and
left unchanged if neither of these conditions has been met.
In the second step, a new candidate point in coecient space
is obtained from a normal distribution of standard deviation
k centred on the current point. The fitness of this new point
is then evaluated. If the fitness is improved, the new point
is retained and becomes the current point; if the fitness is
not improved, the point an equal distance in the opposite
direction is tested; and if better, it becomes the current point.
If neither yields an improvement, the current point is kept
and the algorithm returns to the first step.
The use of this hybrid arrangement of EA and hill climber
introduces further control parameters into the adaptive system, namely, the number of structures to undergo local optimization and the number of iterations in each hill-climbing
episode. Two extremes were investigated. In the first, hybrid A, every model in the population underwent a limited
amount of hill climbing. The other configuration, hybrid B,
locally optimized only the best structure in the population at
each generational step. In order to allow for direct comparison with the results in the previous section, the population
size was reduced so that there would be approximately the
same number of function evaluations in each case. For hybrid A, each model in a population of 100 underwent three
iterations of the hill-climbing algorithm at every generational
step while for hybrid B the population was set to 300 and
then the best at each generation was optimized over approximately 100 iterations of the random hill-climbing procedure.
Simulation experiments indicated that both hybrids were
able to track the slowly varying environment requiring less
than two hundred generations to acquire near-optimal coefficient values. The smaller population size implemented in
each case resulted in poorer initial performance, but this was
oset by the increased rate of improvement brought about
by the local hill-climbing operator. In the case of intermittent step changes in the unknown system characteristics, the
performance of the two hybrids was observed to fall between
that of the standard and random immigrants GAs. Figure 5
compares the tracking performance of these two hybrid GA
configurations in a rapidly changing environment. Hybrid A
(development of every individual) is represented by a gray
line. The second hill-climbing/GA hybrid (development of
the best individual) is shown by a black solid line. Although
a slight bias in the estimated coecients is sometimes in evidence, hybrid A is clearly able to track the qualitative behaviour of the plant coecients. Development of the best individual, however, is not sucient to induce reliable tracking
and the performance of hybrid B suers as a result.
The addition of individual improvement within the EA
framework has resulted in an adaptive algorithm which is
able to track the coecients of a rapidly varying system
(d > 1) with some success. This is a feat which poses considerable problems to conventional adaptive algorithms (see
Section 3.1). Wholesale local improvement was observed to
839
2.0
2.0
1.0
1.0
a1
a1
0.0
200
400
600
800
2.0
1000
2.0
2.0
1.0
1.0
0.0
0.0
a2
a2
2.0
1.0
2.0
200
400
600
800
2.0
1000
200
400
600
800
1000
200
400
600
800
1000
200
400
600
Generations
800
1000
1.0
0.0
1.0
a3
0.0
1.0
2.0
3.0
1.0
1.0
a3
0.0
1.0
1.0
2.0
200
400
600
Generations
800
3.0
1000
Hybrid A: Development of every individual

Hybrid B: Development of best individual
True value of the coecient

outperform the development of a single individual since this

latter technique leaves the remainder of the population trailing behind the best structure. As the nonstationarity degree
of the plant is increased, an adaptive algorithm relying solely
upon evolutionary principles will lag further behind the time
variations. This hybrid technique, however, permits the provision of greater local optimization flexibility (more iterations of the hill climber) when required.
Figure 6 illustrates the tracking performance of the hybrid GA subjected to a time-varying environment in which
the nonstationarity degree was three times greater than in
the previous experiment (d = 4.8). The population in this
case contained 400 models, each one undergoing ten local
optimization iterations at every generational step. The inputoutput block size was further reduced to just two samples
in order that the plant coecients would not vary substantially within the duration of a data block. This resulted in
the coecient estimates generated by the hybrid adaptive algorithm fluctuating about their trajectory to a greater extent. Individual evaluations of candidate models, however,
required far less computation. The overall tracking performance of the hybrid was observed to be less accurate in
this case but the mean estimates of the time-varying plant
coecients were observed to express the correct qualitative
behaviour.
With emphasis shifting away from the role of evolutionary improvement in the hybrid adaptive algorithm as the
time variations become more extreme, the balance of explo-

ration versus exploitation (or global versus local search) is

altered. This highlights that no single adaptation scheme is
likely to outperform all others on every class of time-varying
problem. On slowly varying systems, for example, a more
or less conventional EA provided good performance. When
the unknown system was aected by intermittent but largescale time variations, the wider ranging search of the random immigrants operator was required. If the error surface is
multimodal, hill-climbing operators are unlikely to provide
the desired search characteristics. Conversely, with a rapidly
changing system, the fast local search engendered by the hillclimbing operator provides the necessary response since only
relatively minor changes to the optimal coecients occur at
each generational step. However, this classification assumes
that the nature of the time variations aecting the unknown
system is known in advance. When such information is not
available or when more than one class of time variation is
present, some combination of techniques may be desirable.
6.
CONCLUSIONS
On system identification tasks where the plant coecients

are changing slowly (d 1), both the floating-point GA
and the random immigrants GA were able to track the time
variations. However, when the time variations were infrequent but large in magnitude (jump variations), the standard
GA was unable to react quickly to the changes in the coecient values; but the random immigrants mechanism, on the
other hand, produced sucient diversity in the population
to rapidly respond to such step-like time variations. Neither
algorithm was able to successfully track the plant coecients
840
when the time variations were rapid and continuous (d > 1).
In the final section of the paper, a hybrid scheme is introduced and shown to be more eective than either of the earlier schemes for tracking these rapid variations.
REFERENCES
[1] D. M. Etter, M. J. Hicks, and K. H. Cho, Recursive
adaptive filter design using an adaptive genetic algorithm,
in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing
(ICASSP 82), vol. 2, pp. 635638, IEEE, Paris, France, May
1982.
[2] R. Nambiar, C. K. K. Tang, and P. Mars, Genetic and learning
automata algorithms for adaptive digital filters, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 92), pp.
4144, IEEE, San Francisco, Calif, USA, March 1992.
[3] K. Kristinsson and G. A. Dumont, System identification and
control using genetic algorithms, IEEE Trans. Systems, Man,
and Cybernetics, vol. 22, no. 5, pp. 10331046, 1992.
[4] M. S. White and S. J. Flockton, Adaptive recursive filtering
using evolutionary algorithms, in Evolutionary Algorithms
in Engineering Applications, D. Dasgupta and Z. Michalewicz,
Eds., pp. 361376, Springer-Verlag, Berlin, Germany, 1997.
[5] H. G. Cobb, An investigation into the use of hypermutation as an adaptive operator in genetic algorithms having continuous, time-dependent nonstationary environments, Tech.
Rep. 6760, Navy Center for Applied Research in Artificial Intelligence, Washington, DC, USA, December 1990.
[6] J. J. Grefenstette, Genetic algorithms for changing environments, in Proc. 2nd International Conference on Parallel Problem Solving from Nature (PPSN II), R. Manner and B. Manderick, Eds., pp. 137144, Elsevier, Amsterdam, September 1992.
[7] H. G. Cobb and J. J. Grefenstette, Genetic algorithms for
tracking changing environments, in Proc. 5th International
Conference on Genetic Algorithms (ICGA 93), S. Forrest, Ed.,
pp. 523530, Morgan Kaufmann, San Mateo, CA, USA, July
1993.
[8] A. Neubauer, A comparative study of evolutionary algorithms for on-line parameter tracking, in Proc. 4th International Conference on Parallel Problem Solving from Nature
(PPSN IV), H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.P. Schwefel, Eds., pp. 624633, Springer-Verlag, Berlin, Germany, September 1996.
[9] F. Vavak, T. C. Fogarty, and K. Jukes, A genetic algorithm
with variable range of local search for tracking changing environments, in Proc. 4th International Conference on Parallel Problem Solving from Nature (PPSN IV), H.-M. Voigt,
W. Ebeling, I. Rechenberg, and H.-P. Schwefel, Eds., pp. 376
385, Springer-Verlag, Berlin, Germany, September 1996.
[10] E. Pettit and K. M. Swigger, An analysis of genetic-based pattern tracking and cognitive-based component tracking models of adaptation, in Proc. National Conference on Artificial
Intelligence (AAAI 83), pp. 327332, Morgan Kaufmann, San
Mateo, CA, USA, August 1983.
[11] A. H. Gray Jr. and J. D. Markel, Digital lattice and ladder filter
synthesis, IEEE Transactions on Audio and Electroacoustics,
vol. 21, no. 6, pp. 491500, 1973.
[12] O. Macchi, Adaptive Processing: The Least Mean Squares Approach with Applications in Transmission, John Wiley & Sons,
Chichester, UK, 1995.
[13] O. Macchi, N. Bershad, and M. Mboup, Steady-state superiority of LMS over LS for time-varying line enhancer in noisy
environment, IEE Proceedings F, vol. 138, no. 4, pp. 354360,
1991.

[14] H. Muhlenbein and D. Schlierkamp-Voosen, Predictive
models for the breeder genetic algorithm I. Continuous parameter optimization, Evolutionary Computation, vol. 1, no.
1, pp. 2549, 1993.
[15] D. E. Goldberg, Genetic Algorithms in Search, Optimization
and Machine Learning, Addison-Wesley Publishing, Reading,
Mass, USA, 1989.
[16] J. E. Baker, Reducing bias and ineciency in the selection algorithm, in Genetic Algorithms and Their Applications: Proc. 2nd International Conference on Genetic Algorithms
(ICGA 87), J. J. Grefenstette, Ed., pp. 1421, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, July 1987.
[17] H. Muhlenbein, M. Schomisch, and J. Born, The parallel
genetic algorithm as a function optimizer, in Proc. 4th International Conference on Genetic Algorithms (ICGA 91), R. K.
Belew and L. B. Booker, Eds., pp. 271278, Morgan Kaufmann, University of California, San Diego, Calif, USA, July
1991.
[18] T. Kido, H. Kitano, and M. Nakanishi, A hybrid search
for genetic algorithms: Combining genetic algorithms, TABU
search, and simulated annealing, in Proc. 5th International
Conference on Genetic Algorithms (ICGA 93), S. Forrest, Ed.,
p. 641, Morgan Kaufmann, University of Illinois, UrbanaChampaign, Ill, USA, July 1993.
[19] J. A. Miller, W. D. Potter, R. V. Gandham, and C. N. Lapena,
An evaluation of local improvement operators for genetic algorithms, IEEE Trans. Systems, Man, and Cybernetics, vol. 23,
no. 5, pp. 13401351, 1993.
[20] H. Bersini and J.-M. Renders, Hybridizing genetic algorithms with hill-climbing methods for global optimization:
two possible ways, in Proc. 1st IEEE Conference on Evolutionary Computation (ICEC 94), D. B. Fogel, Ed., vol. I, pp.
312317, IEEE, Picataway, NJ, USA, June 1994.
[21] F. J. Solis and R. J.-B. Wets, Minimization by random search
techniques, Mathematics of Operations Research, vol. 6, no. 1,
pp. 1930, 1981.
Michael S. White was a student at Royal Holloway, University of
London, where he received the B.S. and Ph.D. degrees. He is currently employed by a New York-based hedge fund.
Stuart J. Flockton received the B.S. and Ph.D. degrees from the
University of Liverpool. He is a Senior Lecturer at Royal Holloway,
University of London. His research interests centre around signal
processing and evolutionary algorithms.

A Domain-Independent Window Approach

to Multiclass Object Detection Using
Genetic Programming
Mengjie Zhang
School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
Email: mengjie@mcs.vuw.ac.nz
Victor B. Ciesielski
School of Computer Science and Information Technology, RMIT University, GPO Box 2476v Melbourne, 3001 Victoria, Australia
Email: vc@cs.rmit.edu.au
Peter Andreae
School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
Email: pondy@mcs.vuw.ac.nz
Received 30 June 2002 and in revised form 7 March 2003
This paper describes a domain-independent approach to the use of genetic programming for object detection problems in which
the locations of small objects of multiple classes in large images must be found. The evolved program is scanned over the large
images to locate the objects of interest. The paper develops three terminal sets based on domain-independent pixel statistics
and considers two dierent function sets. The fitness function is based on the detection rate and the false alarm rate. We have
tested the method on three object detection problems of increasing diculty. This work not only extends genetic programming
to multiclass-object detection problems, but also shows how to use a single evolved genetic program for both object classification
and localisation. The object classification map developed in this approach can be used as a general classification strategy in genetic
programming for multiple-class classification problems.
Keywords and phrases: machine learning, neural networks, genetic algorithms, object recognition, target detection, computer
vision.
1.
INTRODUCTION
As more and more images are captured in electronic form,

the need for programs which can find objects of interest in
a database of images is increasing. For example, it may be
necessary to find all tumors in a database of x-ray images,
all cyclones in a database of satellite images, or a particular
face in a database of photographs. The common characteristic of such problems can be phrased as given subimage1 ,
subimage2 , . . . , subimagen which are examples of the objects
of interest, find all images which contain this object and its
location(s). Figure 10 shows examples of problems of this
kind. In the problem illustrated by Figure 10b, we want to
find centers of all of the Australian 5-cent and 20-cent coins
and determine whether the head or the tail side is up. Examples of other problems of this kind include target detection
problems [1, 2, 3], where the task is to find, say, all tanks,
trucks, or helicopters in an image. Unlike most of the cur-
rent work in the object recognition area, where the task is to

detect only objects of one class [1, 4, 5], our objective is to
detect objects from a number of classes.
Domain independence means that the same method will
work unchanged on any problem, or at least on some range
of problems. This is very dicult to achieve at the current
state of the art in computer vision because most systems require careful analysis of the objects of interest and a determination of which features are likely to be useful for the detection task. Programs for extracting these features must then
be coded or found in some feature library. Each new vision
system must be handcrafted in this way. Our approach is to
work from the raw pixels directly or to use easily computed
pixel statistics such as the mean and variance of the pixels
in a subimage and to evolve the programs needed for object
detection.
Several approaches have been applied to automatic object detection and recognition problems. Typically, they use
842
multiple independent stages, such as preprocessing, edge detection, segmentation, feature extraction, and object classification [6, 7], which often results in some eciency and eectiveness problems. The final results rely too much upon the
results of earlier stages. If some objects are lost in one of the
early stages, it is very dicult or impossible to recover them
in the later stage. To avoid these disadvantages, this paper introduces a single-stage approach.
There have been a number of reports on the use of genetic programming (GP) in object detection and classification [8, 9]. Winkeler and Manjunath [10] describe a GP
system for object detection in which the evolved functions
operate directly on the pixel values. Teller and Veloso [11]
describe a GP system and a face recognition application in
which the evolved programs have a local indexed memory.
All of these approaches are based on detecting one class of
objects or two-class classification problems, that is, objects
versus everything else. GP naturally lends itself to binary
problems as a program output of less than 0 can be interpreted as one class and greater than or equal to 0 as the other
class. It is not obvious how to use GP for more than two
classes. The approach in this paper will focus on object detection problems in which a number of objects in more than
two classes of interest need to be localised and classified.
1.1. Outline of the approach to object detection
A brief outline of the method is as follows.
(1) Assemble a database of images in which the locations
and classes of all of the objects of interest are manually
determined. Split these images into a training set and
a test set.
(2) Determine an appropriate size (n n) of a square
which will cover all single objects of interest to form
the input field.
(3) Invoke an evolutionary process with images in the
training set to generate a program which can determine the class of an object in its input field.
(4) Apply the generated program as a moving window
template to the images in the test set and obtain the
locations of all the objects of interest in each class. Calculate the detection rate (DR) and the false alarm rate
(FAR) on the test set as the measure of performance.
1.2. Goals
The overall goal of this paper is to investigate a learning/adaptive, single-stage, and domain-independent approach to multiple-class object detection problems without
any preprocessing, segmentation, and specific feature extraction. This approach is based on a GP technique. Rather
than using specific image features, pixel statistics are used
as inputs to the evolved programs. Specifically, the following
questions will be explored on a sequence of detection problems of increasing diculty to determine the strengths and
limitations of the method.
(i) What image features involving pixels and pixel statistics would make useful terminals?

(ii) Will the 4 standard arithmetic operators be sucient
for the function set?
(iii) How can the fitness function be constructed, given that
there are multiple classes of interest?
(iv) How will performance vary with increasing diculty
of image detection problems?
(v) Will the performance be better than a neural network
(NN) approach [12] on the same problems?
1.3.
Structure
The remainder of this paper gives a brief literature survey,

then describes the main components of this approach including the terminal set, the function set, and the fitness function. After describing the three image databases used here, we
present the experimental results and compare them with an
NN method. Finally, we analyse the results and the evolved
programs and present our conclusions.
2.
LITERATURE REVIEW
2.1.
Object detection
The term object detection here refers to the detection of small

objects in large images. This includes both object classification and object localisation. Object classification refers to the
task of discriminating between images of dierent kinds of
objects, where each image contains only one of the objects of
interest. Object localisation refers to the task of identifying the
positions of all objects of interest in a large image. The object
detection problem is similar to the commonly used terms automatic target recognition and automatic object recognition.
We classify the existing object detection systems into
three dimensions based on whether the approach is segmentation free or not, domain independent or specific, and on
the number of object classes of interest in an image.
2.1.1
Segmentation-based versus single stage
According to the number of independent stages used in the

detection procedure, we divide the detection methods into
two categories.
(i) Segmentation-based approach, which uses multiple independent stages for object detection. Most research on object detection involves 4 stages: preprocessing, segmentation,
feature extraction, and classification [13, 14, 15], as shown in
Figure 1. The preprocessing stage aims to remove noise or
enhance edges. In the segmentation stage, a number of coherent regions and suspicious regions which might contain objects are usually located and separated from the entire
images. The feature extraction stage extracts domain-specific
features from the segmented regions. Finally, the classification stage uses these features to distinguish the classes of
the objects of interest. The algorithms or methods for these
stages are generally domain specific. Learning paradigms,
such as NNs and genetic algorithms/programming, have
usually been applied to the classification stage. In general,
each independent stage needs a program to fulfill that specific task and, accordingly, multiple programs are needed for
object detection problems. Success at each stage is critical
Multiclass Object Detection Using Genetic Programming
Source
databases
843
Preprocessing
Segmentation
Feature
extraction
Classification
(1)
(2)
(3)
(4)
Figure 1: A typical procedure for object detection.
to achieving good final detection performance. Detection of

trucks and tanks in visible, multispectral infrared, and synthetic aperture radar images [2], and recognition of tanks in
cluttered images [6] are two examples.
(ii) Single-stage approach, which uses only a single stage
to detect the objects of interest in large images. There is only a
single program produced for the whole object detection procedure. The major property of this approach is that it is segmentation free. Detecting tanks in infrared images [3] and
detecting small targets in cluttered images [16] based on a
single NN are examples of this approach.
While most recent work on object detection problems
concentrates on the segmentation-based approach, this paper focuses on the single-stage approach.
2.1.2 Domain-specific approach versus
domain-independent approach
In terms of the generalisation of the detection systems, there
are two major approaches.
(i) Domain-specific object detection, which uses specific
image features as inputs to the detector or classifier. These
features, which are usually highly domain dependent, are extracted from entire images or segmented images. In a lentil
grading and quality assessment system [17], for example, features such as brightness, colour, size, and perimeter are extracted and used as inputs to an NN classifier. This approach
generally involves a time-consuming investigation of good
features for a specific problem and a handcrafting of the corresponding feature extraction programs.
(ii) Domain-independent object detection, which usually
uses the raw pixels directly (no features) as inputs to the
detector or classifier. In this case, feature selection, extraction, and the handcrafting of corresponding programs can
be completely removed. This approach usually needs learning and adaptive techniques to learn features for the detection task. Directly using raw image pixel data as input to
NNs for detecting vehicles (tanks, trucks, cars, etc.) in infrared images [1] is such an example. However, long learning/evolution times are usually required due to the large
number of pixels. Furthermore, the approach generally requires a large number of training examples [18]. A special
case is to use a small number of domain-independent, pixel
level features (referred to as pixel statistics) such as the mean
and variance of some portions of an image [19].
2.1.3 Multiple class versus single class
Regarding the number of object classes of interest in an image, there are two main types of detection problems.
(i) One-class object detection problem, where there are
multiple objects in each image, however they belong to a sin-
gle class. One special case in this category is that there is only
one object of interest in each source image. In nature, these
problems contain a binary classification problem: object versus nonobject, also called object versus background. Examples
are detecting small targets in thermal infrared images [16]
and detecting a particular face in photograph images [20].
(ii) Multiple-class object detection problem, where there
are multiple object classes of interest, each of which has multiple objects in each image. Detection of handwritten digits
in zip code images [21] is an example of this kind.
It is possible to view a multiclass problem as series of binary problems. A problem with objects 3 classes of interest
can be implemented as class1 against everything else, class2
against everything else, and class 3 against everything else.
However, these are not independent detectors as some methods of dealing with situations when two detectors report an
object at the same location must be provided.
In general, multiple-class object detection problems are
more dicult than one-class detection problems. This paper
is focused on detecting multiple objects from a number of
classes in a set of images, which is particularly dicult. Most
research in object detection which has been done so far belongs to the one-class object detection problem.
2.2.
Performance evaluation
In this paper, we use the DR and FAR to measure the performance of multiclass object detection problems. The DR
refers to the number of small objects correctly reported by a
detection system as a percentage of the total number of actual objects in the image(s). The FAR, also called false alarms
per object or false alarms/object [16], refers to the number
of nonobjects incorrectly reported as objects by a detection
system as a percentage of the total number of actual objects
in the image(s). Note that the DR is between 0 and 100%,
while the FAR may be greater than 100% for dicult object
detection problems.
The main goal of object detection is to obtain a high DR
and a low FAR. There is, however, a trade-o between them
for a detection system. Trying to improve the DR often results
in an increase in the FAR, and vice versa. Detecting objects in
images with very cluttered backgrounds is an extremely difficult problem where FARs of 2002000% (i.e., the detection
system suggests that there are 20 times as many objects as
there really are) are common [5, 16].
Most research which has been done in this area so far only
presents the results of the classification stage (only the final
stage in Figure 1) and assumes that all other stages have been
properly done. However, the results presented in this paper
are the performance for the whole detection problem (both
the localisation and the classification).
844
2.3. Related workGP for object detection
Since the early 1990s, there has been only a small amount
of work on applying GP techniques to object classification,
object detection, and other vision problems. This, in part,
reflects the fact that GP is a relatively young discipline compared with, say, NNs.
2.3.1 Object classification
Tackett [9, 22] uses GP to assign detected image features to a
target or nontarget category. Seven primitive image features
and twenty statistical features are extracted and used as the
terminal set. The 4 standard arithmetic operators and a logic
function are used as the function set. The fitness function is
based on the classification result. The approach was tested
on US Army NVEOD Terrain Board imagery, where vehicles,
such as tanks, need to be classified. The GP method outperformed both an NN classifier and a binary tree classifier on
the same data, producing lower rates of false positives for the
same DRs.
Andre [23] uses GP to evolve functions that traverse an
image, calling upon coevolved detectors in the form of hitmiss matrices to guide the search. These hit-miss matrices
are evolved with a two-dimensional genetic algorithm. These
evolved functions are used to discriminate between two letters or to recognise single digits.
Koza in [24, Chapter 15] uses a turtle to walk over a
bitmap landscape. This bitmap is to be classified either as a
letter L, a letter I, or neither of them. The turtle has access to the values of the pixels in the bitmap by moving over
them and calling a detector primitive. The turtle uses a decision tree process, in conjunction with negative primitives, to
walk over the bitmap and decide which category a particular
landscape falls into. Using automatically defined functions as
local detectors and a constrained syntactic structure, some
perfect scoring classification programs were found. Further
experiments showed that detectors can be made for dierent
sizes and positions of letters, although each detector has to
be specialised to a given combination of these factors.
Teller and Veloso [11] use a GP method based on the
PADO language to perform face recognition tasks on a
database of face images in which the evolved programs have
a local indexed memory. The approach was tested on a
discrimination task between 5 classes of images [25] and
achieved up to 60% correct classification for images without
noise.
Robinson and McIlroy [26] apply GP techniques to the
problem of eye location in grey-level face images. The input data from the images is restricted to a 3000-pixel block
around the location of the eyes in the face image. This approach produced promising results over a very small training set, up to 100% true positive detection with no false positives, on a three-image training set. Over larger sets, the GP
approach performed less well however, and could not match
the performance of NN techniques.
Winkeler and Manjunath [10] produce genetic programs
to locate faces in images. Face samples are cut out and
scaled, then preprocessed for feature extraction. The statis-

tics gleaned from these segments are used as terminals in GP
which evolves an expression returning how likely a pixel is
to be part of a face image. Separate experiments process the
grey-scale image directly, using low-level image processing
primitives and scale-space filters.
2.3.2
Object detection
All of the reported GP-based object detection approaches belong to the one-class object detection category. In these detection problems, there is only one object class of interest in the
large images.
Howard et al. [19] present a GP approach to automatic
detection of ships in low-resolution synthetic aperture radar
imagery. A number of random integer/real constants and
pixel statistics are used as terminals. The 4 arithmetic operators and min and max operators constitute the function
set. The fitness is based on the number of the true positive
and false positive objects detected by the evolved program.
A two-stage evolution strategy was used in this approach. In
the first stage, GP evolved a detector that could correctly distinguish the target (ship) pixels from the nontarget (ocean)
pixels. The best detector was then applied to the entire image and produced a number of false alarms. In the second
stage, a brand new run of GP was tasked to discriminate between the clear targets and the false alarms as identified in the
first stage and another detector was generated. This two-stage
process resulted in two detectors that were then fused using
the min function. These two detectors return a real number,
which if greater than zero denotes a ship pixel, and if zero or
less denotes an ocean pixel. The approach was tested on images chosen from commercial SAR imagery, a set of 50 m and
100 m resolution images of the English Channel taken by the
European Remote Sensing satellite. One of the 100 m resolution images was used for training, two for validation, and two
for testing. The training was quite successful with perfect DR
and no false alarms, while there was only one false positive
in each of the two test images and the two validation images
which contained 22, 22, 48, and 41 true objects.
Isaka [27] uses GP to locate mouth corners in small
(50 40) images taken from images of faces. Processing each
pixel independently using an approach based on relative intensities of surrounding pixels, the GP approach was shown
to perform comparably to a template matching approach on
the same data.
A list of object detection related work based on GP is
shown in Table 1.
3.
3.1.
GP ADAPTED TO MULTICLASS OBJECT DETECTION

The GP system
In this section, we describe our approach to a GP system for

multiple-class object detection problems. Figure 2 shows an
overview of this approach, which has a learning process and
a testing procedure. In the learning/evolutionary process, the
evolved genetic programs use a square input field which is
large enough to contain each of the objects of interest. The
programs are applied in a moving window fashion to the
845
Table 1: Object detection-related work based on GP.

Problems
Applications
Authors
Tank detection
(classification)
Tackett
1993
[9]
Tackett
1994
[22]
Andre
1994
[23]
Koza
1994
[24]
Teller and Veloso
1995
[11]
Letter recognition
Object classification
Face recognition
Small target classification
Object detection
Year
Source
Stanhope and Daida
1998
[28]
Winkeler and Manjunath
1997
[10]
Shape recognition
Teller and Veloso
1995
[25]
Eye recognition
Robinson and McIlroy
1995
[26]
Ship detection
Howard et al.
1999
[19]
Mouth detection
Isaka
1997
[27]
Small target detection
Benson
2000
[29]
Vehicle detection
Howard et al.
2002
[30]
Edge detection
Lucier et al.
1998
[31]
San Mateo trail problem

Other vision problems
Image analysis
Koza
1992
[32]
Koza
1993
[33]
Howard et al.
2001
[34]
Poli
1996
[35]
Model interpretation
Lindblad et al.
2002
[36]
Stereoscopic vision
Graae et al.
2000
[37]
Image compression
Nordin and Banzhaf
1996
[38]
entire images in the training set to detect the objects of interest. In the test procedure, the best evolved genetic program
obtained in the learning process is then applied to the entire images in the test set to measure object detection performance.
The learning/evolutionary process in our GP approach is
summarised as follows.
(1) Initialise the population.
(2) Repeat until a termination criterion is satisfied.
(2.1) Evaluate the individual programs in the current
population. Assign a fitness to each program.
(2.2) Until the new population is fully created, repeat
the following:
(i) select programs in the current generation;
(ii) perform genetic operators on the selected
programs;
(iii) insert the result of the genetic operations
into the new generation.
(3) Present the best individual in the population as the
outputthe learned/evolved genetic program.
In this system, we used a tree-like program structure
to represent genetic programs. The ramped half-and-half
method was used for generating the programs in the initial
population and for the mutation operator. The proportional
selection mechanism and the reproduction, crossover, and
mutation operators were used in the learning process.
In the remainder of this section, we address the other aspects of the learning/evolutionary system: (1) determination
of the terminal set, (2) determination of the function set, (3)
development of a classification strategy, (4) construction of
the fitness measure, and (5) selection of the input parameters and determination of the termination strategy.
3.2.
The terminal sets
For object detection problems, terminals generally correspond to image features. In our approach, we designed three
dierent terminal sets: local rectilinear features, circular features, and pixel features. In all these cases, the features are
statistical properties of regions of the image, and we refer to
them as pixel statistics.
3.2.1
Terminal set Irectilinear features
In the first terminal set, twenty pixel statistics, F1 to F20

in Table 2, are extracted from the input field as shown in
Figure 3. The input field must be suciently large to contain
the biggest object and some background, yet small enough to
include only a single object. In this way, the evolved program,
as a detector, could automate the human eye system of
identifying pixels/object centres which stand out from their
local surroundings.
In Figure 3, the grey-filled circle denotes an object of interest and the square A1 B1 C1 D1 represents the input field.
846
Entire images
(detection training set)
GP learning/evolutionary process
Entire images
(detection test set)
General programs
Object detection (GP testing)
Detection results
Figure 2: An overview of the GP approach for multiple-class object

detection.
(ii) Local regional features (from small squares and lines)

are included. This assists the finding of object centres
in the sweeping procedureif the evolved program is
considered as a moving window template, the match
between the template and the subimage forming the
input field will be better when the moving template is
close to the centre of an object.
(iii) They are domain-independent and easy to extract.
These features belong to the pixel level and can be part
of a domain-independent preexisting feature library of
terminals from which the GP evolutionary process is
expected to automatically learn and select only those
relevant to a particular domain. This is quite dierent
from the traditional image processing and computer
vision approaches where the problem-specific features
are often needed.
(iv) The number of these features is fixed. In this approach,
the number of features is always twenty no matter what
size the input field is. This is particularly useful for the
generalisation of the system implementation.
3.2.2
Table 2: Twenty pixel statistics. (SD: standard deviation.)

Pixel statistics
Mean
SD
F1
F3
F5
F7
F9
F11
F13
F15
F17
F19
F2
F4
F6
F8
F10
F12
F14
F16
F18
F20
Regions and lines of interest

big square A1 B1 C1 D1
small central square A2 B2 C2 D2
upper left square A1 E1 OG1
upper right square E1 B1 H1 O
lower left square G1 OF1 D1
lower right square OH1 C1 F1
central row of the big square G1 H1
central column of the big square E1 F1
central row of the small square G2 H2
central column of the small square E2 F2
The five smaller squares represent local regions from which

pixel statistics will be computed. The 4 central lines (rows
and columns) are also used for a similar purpose.1 The mean
and standard deviation of the pixels comprising each of these
regions are used as two separate features. There are 6 regions
giving 12 features, F1 to F12 . We also use pixels along the main
axes (4 lines) of the input field, giving features F13 to F20 .
In addition to these pixel statistics, we use a terminal
which generates a random constant in the range [0, 255].
This corresponds to the range of pixel intensities in grey-level
images.
These pixel statistics have the following characteristics.
(i) They are symmetrical.
1 These lines can be considered special local regions. If the input field size
n is an even number, each of these lines is a rectangle consisting of two
rows or two columns of pixels.
Terminal set IIcircular features
The second terminal set is based on a number of circular

features, as shown in Figure 4. The features were computed
based on a series of concentric circles centred in the input
field. This terminal set focused on boundaries rather than regions. The gap between the radii of two neighbouring circles
is one pixel. For instance, if the input field is 19 19 pixels, then the number of central circles will be 19/2 + 1 = 10
(the central pixel is considered as a circle with a zero radius);
accordingly, there would be 20 features. Compared with the
rectilinear terminal set, the number of these circular features in this terminal set depends on the size of the input
field.
3.2.3
Terminal set IIIpixels
The goal of this terminal set is to investigate the use of raw

pixels as terminals in GP. To decrease the computation cost,
we considered a 2 2 square, or 4 pixels, as a single pixel.
The average value of the 4 pixels in the square was used as
the value of this pixel, as shown in Figure 5.
3.3.
The function sets
We used two dierent function sets in the experiments: 4

arithmetic operations only, and a combination of arithmetic
and transcendental functions.
3.3.1
Function set I
In the first function set, the 4 standard arithmetic operations

were used to form the nonterminal nodes:
FuncSet1 = {+, , , / }.
(1)
The +, , and operators have their usual meanings

addition, subtraction, and multiplication, while / represents
protected division which is the usual division operator

A1
E1
847
B1
Squares:
n/2
A2
E2
B2
A1 B1 C1 D1 , A2 B2 C2 D2 ,
A1 E1 OG1 , E1 B1 H1 O,
G1
H2
G2
H1
G1 OF1 D1 , OH1 C1 F1
n/2
D2
F2
D1
Rows and columns (lines):
F1
n/2
G1 H1 , E1 F1 , G2 H2 , E2 F2
C2
Size of the lines:

G2 H2 = A2 B2 = E2 F2 = B2 C2 :
C1
User defined; Default = n/2
n/2
n
Figure 3: The input field and the image regions and lines for feature selection in constructing terminals.
Features
Mean SD
F1
F2
F4
F3
F5
F6
..
..
.
.
F(2i+1) F(2i+2)
..
..
.
.
F(2n+1) F(2n+2)
C1 C2 Ci Cn
Local boundaries
Central pixel
Circular boundary C1
Circular boundary C2
..
.
Circular boundary Ci
..
.
Circular boundary Cn
Figure 4: The input field and the image boundaries for feature extraction in constructing terminals.
A generated program consisting of the 4 functions and

a number of rectilinear terminals is shown in Figure 6. The
LISP form of this program is shown in Figure 7.
This program performed particularly well for the coin
images.
3.3.2
Figure 5: Pixel terminals.
except that a divide by zero gives a result of zero. Each of

these functions takes two arguments. This function set was
designed to investigate whether the 4 standard arithmetic
functions are sucient for the multiple-class object detection problems.
Function set II
We also designed a second function set. We hypothesized

that convergence might be quicker if the function values were
close to the range (1, 1) and more functions might lead to
better results if the 4 arithmetic functions were not sucient.
We introduced some transcendental functions, that is, the
absolute function dabs, the trigonometric sine function sin,
the logarithmetic function log, and the exponent (to base e)
function exp, to form the second function set:
FuncSet2 = {+, , , /, dabs, sin, log, exp}.
3.4.
(2)
Object classification strategy
The output of a genetic program in a standard GP system is a floating point number. Genetic programs can be
848
F11
F16
F
F14 F20
+ F5 +
+ F12 F14 (F9 F11 F1 F10 F9 F17 ) 5
F14
F11
F18

F6
(133.082 F17 )
F17 + (F11 + F12 ) F20 + F2 + 145.765
F11
+ (F6 F5 F3 F6 )
F1 + 145.765 + F16 F10

F12
F18
F11
F14 F20
[F17 + (F17 + F12 ) F20 + F14 F12 (F1 + F12 F17 )]
Figure 6: A generated program for the coin detection problem.
(+ (- (+ (+ (/ F16 F14 ) F5 ) (+ (/ (/ F11 (* F14 F20 )) F11 ) (- F12

F14 ))) (- (* (- (* (* (* F9 F11 ) F1 ) F10 ) (* F9 F17 )) (/ F5 F18 )) ((+ (+ F17 (* (+ F11 F12 ) F20 )) (* (- (+ F2 145.765) (/ F6 F11 )) (133.082 F17 ))) (/ F11 (* F14 F20 ))))) (* (- (* (- (- F6 F5 ) (* F3
F6 )) (/ (+ (+ F1 145.765) (* F16 F10 )) F18 )) F12 ) (+ (+ F17 (* (+ F17
F12 ) F20 )) (* (+ F14 F12 ) (- (+ F1 F12 ) F17 )))))
Figure 7: LISP format of the generated program in Figure 6.
used to perform one-class object detection tasks by utilising the division between negative and nonnegative numbers of a genetic program output. For example, negative
numbers can correspond to the background and nonnegative numbers to the objects in the (single) class of interest. This is similar to binary classification problems in standard GP where the division between negative and nonnegative numbers acts as a natural boundary for a distinction
between the two classes. Thus, genetic programs generated
by the standard GP evolutionary process primarily have the
ability to represent and process binary classification or oneclass object detection tasks. However, for the multiple-class
object detection problems described here, where more than
two classes of objects of interest are involved, the standard
GP classification strategy mentioned above cannot be applied.
In this approach, we develop a dierent strategy which
uses a program classification map, as shown in Figure 8, for
the multiple-class object detection problems. Based on the
output of an evolved genetic program, this map can identify
which class of the object located in the current input field belongs to. In this map, m refers to the number of object classes
of interest, v is the output value of the evolved program, and
T is a constant defined by the user, which plays a role of a
threshold.
3.5. The fitness function
Since the goal of object detection is to achieve both a high DR
and a low FAR, we should consider a multiobjective fitness
function in our GP system for multiple-class object detection
problems. In this approach, the fitness function is based on
a combination of the DR and the FAR on the images in the

training set during the learning process. Figure 9 shows the
object detection procedure and how the fitness of an evolved
genetic program is obtained.
The fitness of a genetic program is obtained as follows.
(1) Apply the program as a moving n n window template
(n is the size of the input field) to each of the training
images and obtain the output value of the program at
each possible window position. Label each window position with the detected object according to the object classification strategy described in Figure 8. Call
this data structure a detection map. An object in a detection map is associated with a floating point program output.
(2) Find the centres of objects of interest only. This is done
as follows. Scan the detection map for an object of interest. When one is found, mark this point as the centre
of the object and continue the scan n/2 pixels later in
both horizontal and vertical directions.
(3) Match these detected objects with the known locations
of each of the desired true objects and their classes. A
match is considered to occur if the detected object is
within tolerance pixels of its known true location. A
tolerance of 2 means that an object whose true location is (40, 40) would be counted as correctly located
at (42, 38) but not at (43, 38). The tolerance is a constant parameter defined by the user.
(4) Calculate the DR and the FAR of the evolved program.
(5) Compute the fitness of the program as follows:
fitness(FAR, DR) = W f FAR + Wd (1 DR),
(3)
849
v
background,
class 1,
class
2,
Class = ..
class i,
..
class m,
v < 0,
0 v T,
T v 2T,
..
.
(i 1) T v i T,
..
.
v i T,
(m 1) T
Class m
.
.
.
iT
Class i
.
.
.
T
Class 1
0
Background
Figure 8: Mapping of program output to an object classification.
Sweep programs
on training images
into three categories: search parameters, genetic parameters,

and fitness parameters.
3.6.1
Find object centre
Match objects
Calculate DR and FAR
Compute fitness
Figure 9: Object detection and fitness calculation.
where W f and Wd are constant weights which reflect

the relative importance of FAR versus DR.2
With this design, the smaller the fitness, the better the
performance. Zero fitness is the ideal case, which corresponds to the situation in which all of the objects of interest in each class are correctly found by the evolved program
without any false alarms.
3.6. Main parameters
Once a GP system has been created, one must choose a set
of parameters for a run. Based on the roles they play in the
learning/evolutionary process, we group these parameters
2 Theoretically, W and W could be replaced by a single parameter since
f
d
they have only one degree of freedom. However, the two cases of using a single and double parameters have dierent eects for stopping the evolutionary process. For convenience, we use two parameters.
Search parameters
The search parameters used here include the number of individuals in the population (population-size), the maximum
depth of the randomly generated programs in the initial population (initial-max-depth), the maximum depth permitted
for programs resulting from crossover and mutation operations (max-depth), and the maximum generations the evolutionary process can run (max-generations). These parameters control the search space and when to stop the learning
process. In theory, the larger these parameters, the more the
chance of success. In practice, however, it is impossible to set
them very large due to the limitations of the hardware and
high cost of computation.
There is another search parameter, the size of the input
field (input-size), which decides the size of the moving window in which a genetic program is computed in the program
sweeping procedure.
3.6.2
Genetic parameters
The genetic parameters decide the number of genetic programs used/produced by dierent genetic operators in the
mating pool to produce new programs in the next generation. These parameters include the percentage of the best
individuals in the current population that are copied unchanged to the next generation (reproduction-rate), the percentage of individuals in the next generation that are to be
produced by crossover (cross-rate), the percentage of individuals in the next generation that are to be produced by mutation (mutation-rate = 100% reproduction-rate cross-rate),
the probability that, in a crossover operation, two terminals will be swapped (cross-term), and the probability that,
in a crossover operation, random subtrees will be swapped
(cross-func = 100% cross-term).
3.6.3
Fitness parameters
The fitness parameters include a threshold parameter (T)

in the object classification algorithm, a tolerance parameter
850

Table 3: Parameters used for GP training for the three databases.
Parameter kinds
Parameter names
Easy images
Coin images
Retina images
Search parameters
Population-size
Initial-max-depth
Max-depth
Max-generations
Input-size
100
4
8
100
14 14
500
5
12
150
24 24
700
6
20
150
16 16
Genetic parameters
Reproduction-rate
Cross-rate
Mutation-rate
Cross-term
Cross-func
10%
65%
25%
15%
85%
1%
74%
25%
15%
85%
2%
73%
25%
15%
85%
Fitness parameters
T
Wf
Wd
Tolerance (pixels)
100
50
1000
2
100
50
1000
2
100
50
3000
2
(tolerance) in object matching, and two constant weight

parameters (W f and Wd ) reflecting the relative importance
of the DR and the FAR in obtaining the fitness of a genetic
program.
alarms. In this case, the fitness of the best individual

program is zero.
(ii) The number of generations reaches the predefined
number, max-generations. Max-generations was determined empirically in a number of preliminary runs as
a point before overtraining generally occurred. While
it would have been possible to use a validation set to
determine when to stop training, we have not done
this. Comparison of training and test DRs and FARs
indicated that overfitting was not significant.
3.6.4 Parameter values

Good selection of these parameters is crucial to success. The
parameter values can be very dierent for various object detection tasks. However, there does not seem to be a reliable
way of a priori deciding these parameter values. To obtain
good results, these parameter values were carefully chosen
through an empirical search in experiments. Values used are
shown in Table 3.
For detecting circles and squares in the easy images, for
example, we set the population size to 100. On each iteration, 10 programs are created by reproduction, 65 programs
by crossover, and 25 by mutation. Of the 65 crossover programs, 10 (15%) are generated by swapping terminals and
55 (85%) by swapping subtrees. The programs are randomly
initialised with a maximum depth of 4 at the beginning and
the depth can be increased to 8 during the evolutionary process. We also use 100, 50, 1000, and 2 as the constant parameters T, W f , Wd , and tolerance, which are used for the
program classification and the calculation of the fitness function. The maximum generation permitted for the evolutionary process is 100 for this detection problem. The size of the
input field is the same as that used in the NN approach [12],
that is, 14 14.
3.7. Termination criteria
In this approach, the learning/evolutionary process is terminated when one of the following conditions is met.
(i) The detection problem has been solved on the training
set, that is, all objects in each class of interest in the
training set have been correctly detected with no false
4.
THE IMAGE DATABASES
We used three dierent databases in the experiments. Example images and key characteristics are given in Figure 10. The
databases were selected to provide detection problems of increasing diculty. Database 1 (easy) was generated to give
well-defined objects against a uniform background. The pixels of the objects were generated using a Gaussian generator with dierent means and variances for each class. There
are three classes of small objects of interest in this database:
black circles (class1), grey squares (class2), and white circles
(class3). The Australian coin images (database 2) were intended to be somewhat harder and were taken with a CCD
camera over a number of days with relatively similar illumination. In these images, the background varies slightly in different areas of the image and between images, and the objects
to be detected are more complex, but still regular. There are
4 object classes of interest: the head side of 5-cent coins (class
head005), the head side of 20-cent coins (class head020), the
tail side of 5-cent coins (class tail005), and the tail side of 20cent coins (class tail020). All the objects in each class have
a similar size. They are located at arbitrary positions and
with some rotations. The retina images (database 3) were
taken by a professional photographer with special apparatus at a clinic and contain very irregular objects on a very
851
Number of images: 10
Object classes: 3
Image size 700 700
Object classes: 4
Image size 640 680
Object classes: 2
Image size 1024 1024
(a) Easy (circles and squares).
(b) Medium diculty (coins).
(c) Very dicult (retinas).
Figure 10: Object detection problems of increasing diculty.
Table 4: Three groups of experiments.

Experiments
Terminal sets
Function sets
TermSet1 (rectilinear)
FuncSet1
TermSet2 (circular)
FuncSet1
II
TermSet3 (pixels)
FuncSet1
III
TermSet1 (rectilinear)
FuncSet2
Figure 11: An enlarged view of one piece of the retina images.
cluttered background. The objective is to find two classes of

retinal pathologieshaemorrhages and microaneurisms. To
give a clear view of representative samples of the target objects in the retina images, one sample piece of these images is
presented in Figure 11. In this figure, haemorrhage and microaneurism examples are labeled using white surrounding
squares.
5.
EXPERIMENTAL RESULTS
We performed three groups of experiments, as shown in

Table 4. The first group of experiments is based on the first
two terminal sets (rectilinear features and circular features)
and the first function set (the 4 standard arithmetic functions). The second group of experiments uses the third terminal set consisting of raw pixel and the first function set.
The third group of experiments uses the first terminal set
consisting of rectilinear features and the second function set
consisting of additional transcendental functions.
In these experiments, 4 out of 10 images in the easy image database are used for training and 6 for testing. For the
coin images, 10 out of 20 are used for training and 10 for
testing. For the retina images, 10 are used for training and
5 for testing. The total number of objects is 300 for the easy
image database, 400 for the Australian coin images, and 328
for the retina images. The results presented in this section
were achieved by applying the evolved genetic programs to
the images in the test sets.
5.1.
Experiment I
This group constitutes the major part of the investigation. The main goal here is to investigate whether this
GP approach can be applied to multiple-class object detection problems of increasing diculty. The parameters used
in these experiments are shown in Table 3 (Section 3.6.4).
The average performance of the best 10 genetic programs
(evolved from 10 runs) for the easy and the coin databases,
and the average performance of the best 5 genetic programs
(out of 5 runs, due to the high computational cost) for the
retina images are presented.
The results are compared with those obtained using an
NN approach for object detection on the same databases
852
[12, 39]. The NN method used was the same as the GP
method shown in Section 1.1, except that the evolutionary
process was replaced by a network training process in step
(3) and the generated genetic program was replaced by a
trained network. In this group of experiments, the networks
also used the same set of pixel statistics as TermSet1 (rectilinear) as inputs. Considerable eort was expended in determining the best network architectures and training parameters. The results presented here are the best results achieved
by the NNs and we believe that the comparison with the GP
approach is a fair one.
5.1.1 Easy images
Table 5 shows the best results of the GP approach with the
two dierent terminal sets (GP1 with TermSet1, GP2 with
TermSet2) and the NN method for the easy images. For class1
(black circles) and class3 (grey circles), all the three methods
achieved a 100% DR with no false alarms. For class2 (grey
squares), the two GP methods also achieved 100% DR with
zero false alarms. However, the NN method had an FAR of
91.2% at a DR of 100%.
5.1.2 Coin images
Experiments with coin images gave similar results to the easy
images. These are shown in Table 6. Detecting the heads and
tails of 5 cents (class head005, tail005) appears to be relatively
straight forward. All the three methods achieved a 100% DR
without any false alarms. Detecting heads and tails of 20cent coins (class head020, tail020) is more dicult. While the
NN method resulted in many false alarms, the two GP methods had much better results. In particular, the GP1 method
achieved the ideal results, that is, all the objects of interest
were correctly detected without any false alarms for all the 4
object classes.
5.1.3 Retina images
The results for the retina images are summarised in Table 7.
Compared with the results for the other image databases,
these results are not satisfactory.3 However, the FAR is greatly
improved over the NN method.
The results over the three databases show similar patterns: the GP-based method always gave a lower FAR than
the NN approach for the same detection rate. While GP2 also
gave the ideal results for the easy images, it produced a higher
FAR on both the coin and the retina images than the GP1
method. This suggests that the local rectilinear features are
more eective for these detection problems than the circular
features.
5.1.4 Training times
We performed these experiments on a 4-processor ULTRASPARC4. The training times for the three databases are very
3 With the current techniques applied in this area, detecting objects in
images with a highly cluttered background is an extremely dicult problem
[5, 16]. In fact, these results are quite competitive to other methods for very
dicult detection problems. As a young discipline, it is quite promising for
GP to achieve such results.

Table 5: Comparison of the object detection results for the easy
images: the GP approaches versus the NN approach. (Input field
size = 14 14; repetitions = 10.)
Easy images
class1
Best detection rate (%)

False alarm rate (%)
NN
GP1
GP2
Object classes
class2
class3
100
100
100
0
0
0
91.2
0
0
0
0
0
Table 6: Comparison of the object detection results for the coin

images. The GP approaches versus the NN approach. (Input field
size = 24 24, repetitions = 10.)
Coin images
Object classes
head005 tail005 head020 tail020
100
100
100
100
NN
False alarm rate (%) GP1
GP2
0
0
0
0
0
0
182
0
38.4
37.5
0
26.7
Table 7: Comparison of the object detection results for the retina

images. The GP approaches versus the NN approach. (Input field
size = 16 16, repetitions = 5.)
Object classes
Haem Micro
Retina images
73.91
100
NN
GP1
GP2
2859
1357
1857
10104
588
732
False alarm rate

(%)
dierent due to various degrees of diculty of the detection problems. The average training times used in the GP
evolutionary process (GP1) for the easy, the coin, and the
retina images are 2 minutes, 36 hours, and 93 hours, respectively.4 This is much longer than the NN method, which took
2 minutes, 35 minutes, and 2 hours on average. However,
the GP method gave much better detection results on all the
three databases. This suggests that the GP method is particularly applicable to tasks where accuracy is the most important factor and training time is seen as relatively unimportant.
4 Even if the training time for dicult problems is very long, the time
spent on applying the learned genetic program to the test set is usually very
short, say, from several seconds to about one minute.
853
Table 8: Results with the second function set.

Easy images
Coin images
Retina images
Class1 Class2 Class3 Head005 Tail005 Head020 Tail020 Haem Micro

False alarm rate (%)
100
100
100
100
100
100
100
73.91
100
1214
463
5.2. Experiment II
Instead of using rectilinear and circular features (pixel statistics) as in experiment I, experiment II directly uses the pixel
values as terminals (the third terminal set). For the input
field sizes of 14 14, 24 24, and 16 16, for the easy, the
coin, and the retina images, the number of terminals are 49
(7 7), 144 (12 12), and 64 (8 8), respectively. For the easy
images, the learning took about 70 hours on a 4-processor
ULTRA-SPARC4 machine to reach perfect detection performance on the training set and 78 generations were taken. The
population size used was 1000, the maximum depth of the
program was 30, the maximum initial depth 10, the maximum number of generations 100. For the coin images and
the retina images, the situation was worse. Since a large number of terminals were used, the maximum depth of the program trees was increased to 50 for the coin images and 60
for the retina images. The population size for both databases
used was 3000 with a maximum number of generations of
100. The evolutionary process took three weeks to complete
50 generations for the coin images and five weeks to complete
50 generations for the retina images. The best detection results were overall 22% FAR at a 100% DR for the coin images,
and about 850% FAR at a DR of 100% for microaneurisms
in the retina images.
While these results are worse than those obtained by the
GP1 and GP2 using the rectilinear and circular features, they
are still better than the NN approach. If we use a larger population (e.g., 10000 or 50000), a larger program size (e.g., 100),
and a larger number of generations (e.g., 300), the results
could be better according to our experience. While this is not
possible to investigate with the current hardware we use, it
shows a promising future direction with the improvement
and development of more powerful hardware, for example,
parallel or genetic hardware.
5.3. Experiment III
Instead of using the four standard arithmetic functions,
this experiment focused on using the extended function
set (FuncSet2), as shown in Section 3.3.2. The parameters
shown in Table 3 (Section 3.6.4) were used in this experiment. The best detection results for the three databases are
shown in Table 8.
As can be seen from Table 8, this function set also gave
ideal results for the easy and the coin images and a better
result for the retina images. The best DR for detecting micro
is 100% with a corresponding FAR of 463%. The best DR
for haem is still 73.91% but the FAR is reduced to 1214%. In
addition, convergence was slightly faster for training the coin

and retina images. This suggests that dabs, sin, log, and exp
are particularly useful for more dicult problems.
6.
6.1.
DISCUSSION
Analysis of results on the retina images
The GP-based approach achieved the ideal results on the easy

images and the coin images, but resulted in some false alarms
on the retina images, particularly for the detection of objects
in class haem in which the FAR was very high and more than
a quarter of the real objects in this class were not detected by
the evolved genetic program.
We identified two possible reasons for the results on the
retina images being worse than the results on the easy and the
coin images. The first reason concerns the complexity of the
background. In the easy and coin images, the background is
relatively uniform, whereas in the retina images it is highly
cluttered. In particular, the background of the retina images
contains many objects, such as veins and other anatomical
features, that are not members of the two classes of interest (microaneurisms and haemorrhages). These objects of
noninterest must be classified as background, in just the
same way as the genuine background. The more complex the
boundary between classes in the input space, the more complex an evolved program has to be to distinguish the classes.
It may be that the more complex background class in the
retina images requires a more complex evolved program than
the GP system was able to discover. It may even be that the
set of terminals and functions is not adequate/sucient to
represent an evolved program to distinguish the objects of
interest from such a rich background.
The second possible reason concerns the variation in size
of the objects. In the easy and coin images, all of the objects in a class have similar sizes, whereas in the retina images, the sizes of the objects in each class vary. This variation
means that the evolved genetic program must cover a more
complicated region of the input space. The sizes of the micro objects vary from 3 3 to 5 5 pixels and the sizes of
the haem objects vary from 6 6 to 14 14 pixels. Given
the size of the input field (16 16) and the choice of terminals, the variance in the size of the haem objects is particularly problematic since it ranges from just one quarter of
the input field (hence entirely inside the central detection region) to almost the entire input field. The fact that the performance on the haem class is worse than the performance
on the micro class (especially in experiment III) provides
854
Program 1
Program 2
Program 3
F3 F14 F15
F5
F3
F19 F7 F10 F17 F16 F18 +
F14 +
F6 F14
F5
F5
F5
F19
F3 F6
F
+ 6
F5
F5 F11
F15
F15
F18 (F5 + F18 ) (F7 + F4 ) + F10 F19
F18
F4 F16
F3 + F5 F3 F5
F4 F16
(F16 + F7 ) F15 F4
F9
F13 F5
F
9 F4
F19
+ F11
F18
F10
Figure 12: Three sample generated programs for simple object detection in the easy images.
(/ (+ (- (- (/ (* (* F3 F14 ) F15 ) (* F6 F14 )) F19 ) (* (* (* (* F7 F10 ) F17 )

F16 ) F18 )) (+ (* (/ F5 F5 ) F14 ) (/ F3 F5 ))) (+ (* (/ F3 F5 ) (/ (/ F5 F6 ) (/
F11 F15 ))) (/ (/ F19 F6 ) (/ F5 F15 ))))
Figure 13: LISP format of Program 1.
additional evidence that the size variation is a cause of the

poor performance.
The first reason suggests that the current approach is limited on images containing cluttered backgrounds. One possible modification to address this limitation is to evolve multiple programs rather than a single program, either having
a separate program for each class of interest, or having several programs to exclude dierent parts of the background.
Another possible modification is to extend the terminal set
and/or function set to enrich the expressive power of the
evolved programs.
The second reason suggests that the current approach has
limited applicability to scale invariant detection problems.
This would not be surprising, given the current set of terminals and functions. In particular, although the pixel statistics
used in the rectilinear and circular terminal sets are robust
to small variations in scale, they are not robust to large variations. We will explore alternative pixel statistics that are more
robust to scale variations, and also function sets that would
allow disjunctive programs that could better represent classes
that contained objects of several dierent size ranges.
conversion of the original LISP format programs evolved by

the evolutionary process. The LISP format of the first program is, for example, shown in Figure 13. Note that we did
not simplify themsimplification of evolved genetic programs is beyond the goal of this paper.) All of these programs
achieved the ideal results: all of the circles and squares were
correctly detected with no false alarms.
There are several things we can note about these programs. Firstly, the programs are not trivial, and are decidedly nonlinear. It is hard to interpret these programs even for
the easy images. Secondly, the programs use many, but not
all, of the terminals, but do not use any constants. There are
no groups of the terminals that are unusedboth the means
and standard deviations of both the square regions and the
lines are used in the programs, so it does not appear that any
of the terminals could be safely removed. Thirdly, although
the programs are not in their simplest form (e.g., the factor
F5 /F5 could be removed from the first program), there is not
a large amount of redundancy, so that the GP search is finding reasonably ecient programs.
6.2.2
6.2. Analysis of evolved programs

This section gives a brief analysis of the best generated programs for the three databases. The genetic programs evolved
by the GP1 in experiment I are used as examples.
6.2.1 Easy images
Figure 12 shows three good sample evolved programs for the
easy images. (These programs were the direct mathematical
Coin images
In addition to the program shown in Figure 6, we present another generated program in Figure 14, which also performed
perfectly for the coin images.
Compared with those for the easy images, these programs
are more complex, which reflects the greater diculty of the
detection problem in the coin images. One dierence is that
these programs also contain constants. The set of possible
programs is considerably expanded by allowing constants as
well as the terminals, but the search for good values for the
855
F12 F12 F17 F2 87.251

F F F9
F10 F12 F9 F2 + F12 10 12
+
F2
F19
87.251
F1
F5

F9
F17 F2 + F12 F12 F11
F11
F15 + F8
F16
F16
F9
F
F15 15
F8
F13 F15 87.251
+
+
+F10 F12 F9
F19
F17 F2
F1
F5
Figure 14: A sample generated program for regular object detection in the coin images.
constants is dicult. Our current GP is biased so that constants are only introduced rarely, but it is clear that the detection problem on the coin images is suciently dicult to
require some of these constants.
6.2.3 Retina images
One evolved genetic program for the retina images is presented in Figure 15. (The program is presented in LISP format rather than standard format because of its complexity.)
This program is much more complex than any of the programs for the easy and the coin images. The program uses
all 20 terminals and 8 constants. It does not seem possible
to make any meaningful interpretation of this program. It
may be that with high-level, domain-specific features and
domain-specific functions, it would be possible for the GP
system to construct simpler and more interpretable programs; however, this would be against one of the goals of
this paper which is to investigate domain-independent approaches.
Even the best programs for the retina images gave quite a
high number of false alarms, and it appears that the 20 terminals and 4 standard arithmetic functions are not sucient
for constructing programs for such dicult detection problems. Nonetheless, the program above still had much better
performance than an NN with the same input features.
6.3. Analysis of classification strategy
As described in Figure 8, we used a program classification
map as the classification strategy. In this map, a constant
T was used to give fixed-size ranges for determining the
classes of those objects from the output of the program. The
parameter can be regarded as a threshold or a class boundary
parameter. Using just a single value for T forces most of the
classes to have an equal possible range in the program output, which might lead to a relatively long time of evolution.
A natural question to raise is whether we can replace the single parameter T with a set of parameters, say, T1 , T2 , . . . , Tm ,
one for each class of interest.
To answer this question, we ran a set of experiments
on the easy images with three parameters, T1 , T2 , and, T3 ,
for the thresholds in the program classification map. The
experiments showed that some sets of values of the parameters resulted in an ideal performance but other sets of values
did not. Also, the learning/evolutionary process converged
very fast with some sets of values but very slowly with others. However, the results of the experiments gave no guidelines for selecting a good set of values for these parameters.
In some cases, using separate parameters for each threshold
may lead to a better performance than using a single parameter, but appropriate values for the parameters need to be
empirically determined. In practice, this is dicult because
there is no a priori knowledge in most cases for setting these
parameters.
We also tried an alternative classification strategy, which
we called multiple binary map, to classify multiple classes of
objects. In this method, we convert a multiple-class classification problem to a set of binary classification problems. Given
a problem L with m classes L = {c1 , c2 , . . . , cm }, the problem is decomposed into L1 = {c1 , other}, L2 = {c2 , other}, . . .,
Lm = {cm , other}, where ci denotes the ith class of interest and
other refers to the class of nonobjects of interest. In this way, a
multiple-class object detection problem is decomposed into
a set of one-class object detection tasks, and GP is applied to
each of the subsets to obtain the detection result for a particular class of interest. We tested this method on the detection
problems in the three image databases and the results were
similar to those of the original experiments.
One disadvantage of this method is that several genetic
programs have to be evolved. On the other hand, the genetic programs may be simpler, which may reduce the training time for each program. In fact, for the coin images problem, a considerably shorter total training time was required
to create a set of one-class programs than to create a single
multiple-class program. A more detailed discussion of this
method is outside the goal of this paper, and is left to future
work.
6.4.
Analysis of crossover and mutation rates
Some GP researchers argue that mutation is useless and

should not be used in GP [32], while some others insist
that a high mutation rate would help the GP evolution converge [40, 41]. To investigate the eects of mutation in GP
for multiclass object detection problems, we carried out ten
856
(* (* (- (/ F6 (+ (* (/ (* F2 (/ (* F6 (+ F1 (- F10 F15 )))

(- (- F18 F17 ) (- F19 87.05))))
(+ 17.0792 (+ F9 F14 )))
(/ (+ F19 (* (+ (+ F11
(- (* (- (- F15 F18 ) (+ 40.58 F16 ))
(- (* F13 (+ (/ 57.64 F16 ) F13 ))
(- F9 F6 )))
(/ (* F3 F1 ) F1 )))
(* (- (* (- (/ (+ (+ F18 (+ (/ (/ F14 F6 )
(+ F6 F1 ))
89.70))
(* F10 F12 )) F2 ) F9 )
(+ (+ F16 14.75) F9 )) F18 )
(/ (/ F13 F1 ) (* (+ F6 F12 ) F9 ))))
(+ F16 F8 )))
(+ (- (- (+ (/ F10 (* F9 F6 )) F13 ) F10 ) F18 )
(+ (* (- (+ F1 F2 ) (+ F17 F8 )) F5 )
(* (* F20 F16 ) F10 )))))
(* (+ (- (* (+ F11
(+ (* F14 F3 )
(/ F15 (/ (+ (* F2 14.5251)
(* (* (/ (* F18
(/ (* F2 F13 ) F15 ))
F1 )
(/ (/ F11 F13 ) (/ F7 F5 )))
(+ (+ F18 (* F2 F13 ))
(/ F8 F12 ))))
F17 )))) F11 ) F16 )
(* (- F1 (+ F3 F8 )) F5 ))
(/ (+ (- F7 F20 ) F18 ) F20 ))))
(* (* (* (* F2 F13 ) F2 )
(/ (* F4 (/ (* F2 F13 ) F15 )) (* F18 F12 )))
(* F14 F2 )))
(+ (+ (- (+ (- F19 F3 ) F2 ) F7 ) (- (+ F8 F17 ) F18 ))
(/ (+ F15 60.10)
(* (* F1 (/ (/ F12 (- (+ (/ (/ F12 F13 ) (/ F15 F5 )) F17 ) F18 ))
(/ F7 F5 ))) F8 ))))
(* (/ (* F10 (/ (* F2 F13 ) F15 )) F18 )
(* (* (* (* F2 F2 ) (/ (/ (/ F18 (+ F1 F2 )) F13 )
(/ (/ (- F15 96.16) (* F4 14.53)) F5 ))) F4 )
(/ (/ F12 F13 ) (/ F1 (+ (/ F10 F1 ) F4 ))))))
Figure 15: A sample generated program for very dicult detection problems in the retina images.
experiments for dierent rates of mutation versus crossover

on the easy images, as shown in Figure 16. The reproduction rate was held constant at 10%, and the mutation rate
varied from 0% to 40%. The graph shows the distribution
of the number of generations to convergence by a box-andwhisker plot with the limits of the central box at the 30%
and 70% percentiles. With both 0% and 40% mutation, the
search sometimes did not converge within the limit of 250
generations. There was a clear eect of the mutation rate on
the number of generations to convergence. The best mutation rate was 25%, where only 48 generations on average were
required to find a good solution, with slower convergence at
both lower and higher mutation rates. Experiments on the

coin and the retina images gave a similar trend. This suggests
that, in GP for multiple-class object detection problems described in this paper, mutation plays an important role for
keeping the diversity of the population, and that convergence
could be sped up when an appropriate mutation rate was
used. However, such a good mutation rate is generally task
dependent, and 15%30% is a good choice for similar tasks.
6.5.
Analysis of reproduction
In early GP, the reproduction rule did a probabilistic selection of genetic programs from the current population based
857
Generations for dierent mutation rates
300
250
200
150
Best fitness
200
100
50
100
0
0%
10%
15%
20%
25%
30%
40%
Figure 16: Convergence versus mutation rate.

0
on their fitness and allowed them to survive by copying them

into the new population. The better the fitness, the more
likely the individual program is to be selected [24, 42]. However, this mechanism does not guarantee that the best program will survive. An alternative reproduction rule is one
that removes the probabilistic element, and simply reproduces the best n genetic programs from the current population. We ran experiments on the easy images with both reproduction rules and plotted the best fitness in each generation (see Figure 17). The dotted curve shows the best fitness with the probabilistic reproduction rule. Over the 100
generations, there are 4 clear intervals (at generation 7, 22,
45, and 67) where the fitness got worse rather than better,
which delayed the convergence of learning. In contrast, the
deterministic reproduction rule had a steady improvement
in fitness. Furthermore, the deterministic reproduction rule
converged on an ideal program after just 71 generations,
while the probabilistic reproduction rule had still not converged on an ideal program after 100 generations. (In fact,
the fitness did not improve at all during the final 30 generations!) Clearly, the new reproduction rule greatly improved
the training speed and convergence.
7.
CONCLUSIONS
The goal of this paper was to develop a domain-independent,

learning/adaptive approach for detecting small objects of
multiple classes in large images based on GP. This goal was
achieved by the use of GP with a set of domain-independent
pixel statistics as terminals, a number of standard operators
as functions, and a linear combination of the DR and FAR
as the fitness measure. A secondary goal was to compare the
performance of this method with an NN method. Here the
GP approach outperformed the NN approach in terms of detection accuracy.
The approach appears to be applicable to detection problems of varying diculty as long as the objects are approximately the same size and the background is not too cluttered.
The paper diers from most work in object detection
in two ways. Most work addresses the one-class problem,
that is, object versus nonobject, or object versus background.
This paper has shown a way of solving a multiple-class object detection problem without breaking it into a collection
20
40
60
Generations
80
100
Old reproduction rule

New reproduction rule
Figure 17: Training easy images based on the old and the new reproduction rules.
of one-class problems. Also, most current research uses different algorithms in multiple independent stages to solve the
localisation problem and the classification problem; in contrast, this paper uses a single learned genetic program for
both object classification and object localisation.
The experiments showed that mutation does play an important role in the three multiple-class object detection tasks.
This is in contrast to Kozas early claim that GP does not need
mutation. For GP applied to multiple-class object detection
problems, the experiments suggest that a 15%30% mutation rate would be a good choice.
The experiments also identified some limitations of the
particular approach taken in the paper. The first limitation concerns the choice of input features and the function set. For the simple and medium-diculty object detection problems, the 20 regional/rectilinear features and 4
standard arithmetic functions performed very well; however,
they were not adequate for the most dicult object detection task. In particular, they were not adequate for detecting
classes of objects with a range of sizes. Further work will be
required to discover more eective domain-independent features and function sets, especially ones that provide some size
invariance.
A second limitation is the high training time required.
One aspect of this training time is the experimentation required to find good values of the various parameters for each
dierent problem. The GP method appears to be applicable
to multiple-class object detection tasks where accuracy is the
most important factor and training time is seen as relatively
unimportant, as is the case in most industrial applications.
Further experimentation may reveal more eective ways of
determining parameters which will reduce the training times.
Subject to these limitations, the paper has demonstrated that GP can be used eectively for the multiple-class
858
detection problem and provides more evidence that GP has
a great potential for application to a variety of dicult problems in the real world.
ACKNOWLEDGMENTS
We would like to thank Dr. James Thom at RMIT University
and Dr. Zhi-Qiang Liu at the University of Melbourne for a
number of useful discussions. Thanks also to Peter Wilson
whose basic GP package was used in this project and to Chris
Kamusinski who provided and labelled the retina images.
REFERENCES
[1] P. D. Gader, J. R. Miramonti, Y. Won, and P. Coeld, Segmentation free shared weight networks for automatic vehicle detection, Neural Networks, vol. 8, no. 9, pp. 14571473,
1995.
[2] A. M. Waxman, M. C. Seibert, A. Gove, et al., Neural processing of targets in visible, multispectral IR and SAR imagery,
Neural Networks, vol. 8, no. 7-8, pp. 10291051, 1995.
[3] Y. Won, P. D. Gader, and P. C. Coeld, Morphological
shared-weight networks with applications to automatic target recognition, IEEE Transactions on Neural Networks, vol.
8, no. 5, pp. 11951203, 1997.
[4] H. L. Roitblat, W. W. L. Au, P. E. Nachtigall, R. Shizumura,
and G. Moons, Sonar recognition of targets embedded in
sediment, Neural Networks, vol. 8, no. 7-8, pp. 12631273,
1995.
[5] M. W. Roth, Survey of neural network technology for automatic target recognition, IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 2843, 1990.
[6] D. P. Casasent and L. M. Neiberg, Classifier and shiftinvariant automatic target recognition neural networks, Neural Networks, vol. 8, no. 7-8, pp. 11171129, 1995.
[7] S. K. Rogers, J. M. Colombi, C. E. Martin, et al., Neural networks for automatic target recognition, Neural Networks, vol.
8, no. 7-8, pp. 11531184, 1995.
[8] J. R. Sherrah, R. E. Bogner, and A. Bouzerdoum, The evolutionary pre-processor: automatic feature extraction for supervised classification using genetic programming, in Proc.
2nd Annual Conference on Genetic Programming (GP-97), J. R.
Koza, K. Deb, M. Dorigo, et al., Eds., pp. 304312, Morgan
Kaufmann, Stanford, Calif, USA, July 1997.
[9] W. A. Tackett, Genetic programming for feature discovery
and image discrimination, in Proc. 5th International Conference on Genetic Algorithms, ICGA-93, S. Forrest, Ed., pp. 303
309, Morgan Kaufmann, Urbana-Champaign, Ill, USA, July
1993.
[10] J. F. Winkeler and B. S. Manjunath, Genetic programming
for object detection, in Proc. 2nd Annual Conference on Genetic Programming (GP-97), J. R. Koza, K. Deb, M. Dorigo,
et al., Eds., pp. 330335, Morgan Kaufmann, Stanford, Calif,
USA, July 1997.
[11] A. Teller and M. Veloso, A controlled experiment: evolution
for learning dicult image classification, in Proc. 7th Portuguese Conference On Artificial Intelligence, C. Pinto-Ferreira
and N. J. Mamede, Eds., vol. 990 of Lecture Notes in Computer
Science, pp. 165176, Springer-Verlag, Funchal, Madeira Island, Portugal, October 1995.
[12] M. Zhang and V. Ciesielski, Centred weight initialization
in neural networks for object detection, in Computer Science 99: Proc. 22nd Australasian Computer Science Conference,
J. Edwards, Ed., pp. 3950, Springer-Verlag, Auckland, New
Zealand, January 1999.

[13] T. Caelli and W. F. Bischof, Machine Learning and Image Interpretation, Plenum Press, New York, NY, USA, 1997.
[14] O. Faugeras, Three-Dimensional Computer VisionA Geometric Viewpoint, MIT Press, Cambridge, Mass, USA, 1993.
[15] E. Gose, R. Johnsonbaugh, and S. Jost, Pattern Recognition and
Image Analysis, Prentice-Hall, Upper Saddle River, NJ, USA,
1996.
[16] M. V. Shirvaikar and M. M. Trivedi, A neural network filter to detect small targets in high clutter backgrounds, IEEE
Transactions on Neural Networks, vol. 6, no. 1, pp. 252257,
1995.
[17] P. Winter, S. Sokhansanj, H. C. Wood, and W. Crerar, Quality assessment and grading of lentils using machine vision,
in Canadian Society of Agricultural Engineering Annual Meeting at the Agricultural Institute of Canada Annual Conference,
Lethbridge, AB, Canada, July 1996, CSAE paper No. 96-310.
[18] E. Baum and D. Haussler, What size net gives valid generalization?, Neural Computation, vol. 1, no. 1, pp. 151160,
1989.
[19] D. Howard, S. C. Roberts, and R. Brankin, Target detection
in SAR imagery by genetic programming, Advances in Engineering Software, vol. 30, no. 5, pp. 303311, 1999.
[20] S.-H. Lin, S.-Y. Kung, and L.-J. Lin,
Face recognition/detection by probabilistic decision-based neural network, IEEE Transactions on Neural Networks, vol. 8, no. 1,
pp. 114132, 1997.
[21] Y. LeCun, B. Boser, J. S. Denker, et al., Backpropagation applied to handwritten zip code recognition, Neural Computation, vol. 1, no. 4, pp. 541551, 1989.
[22] W. A. Tackett, Recombination, selection, and the genetic construction of computer programs, Ph.D. thesis, Faculty of the
Graduate School, University of Southern California, Canoga
Park, Calif, USA, April 1994.
[23] D. Andre, Automatically defined features: the simultaneous evolution of 2-dimensional feature detectors and an algorithm for using them, in Advances in Genetic Programming,
K. E. Kinnear, Jr., Ed., pp. 477494, MIT Press, Cambridge,
Mass, USA, 1994.
[24] J. R. Koza, Genetic Programming II: Automatic Discovery of
Reusable Programs, MIT Press, Cambridge, Mass, USA, 1994.
[25] A. Teller and M. Veloso, PADO: learning tree structured algorithms for orchestration into an object recognition system,
Tech. Rep. CMU-CS-95-101, Department of Computer Science, Carnegie Mellon University, Pittsburgh, Pa, USA, 1995.
[26] G. Robinson and P. McIlroy, Exploring some commercial
applications of genetic programming, in Proc. AISB Workshop on Evolutionary Computing, T. C. Fogarty, Ed., vol. 993
of Lecture Notes in Computer Science (LNCS), pp. 234264,
Springer-Verlag, Sheeld, UK, April 1995.
[27] S. Isaka, An empirical study of facial image feature extraction
by genetic programming, in Late Breaking Papers at the 1997
Genetic Programming Conference, J. R. Koza, Ed., pp. 9399,
Stanford Bookstore, Stanford, Calif, USA, July 1997.
[28] S. A. Stanhope and J. M. Daida, Genetic programming
for automatic target classification and recognition in synthetic aperture radar imagery, in Evolutionary Programming
VII: Proc. 7th Annual Conference on Evolutionary Programming, V. W. Porto, N. Saravanan, D. Waagen, and A. E. Eiben,
Eds., vol. 1447 of Lecture Notes in Computer Science (LNCS),
pp. 735744, Springer-Verlag, San Diego, Calif, USA, March
1998.
[29] K. Benson, Evolving finite state machines with embedded genetic programming for automatic target detection within SAR
imagery, in Proc. 2000 Congress on Evolutionary Computation
CEC00, pp. 15431549, IEEE Press, La Jolla, Calif, USA, July
2000.

[30] D. Howard, S. C. Roberts, and C. Ryan, The boru data
crawler for object detection tasks in machine vision, in Proc.
EvoWorkshops 2002, Applications of Evolutionary Computing,
S. Cagnoni, J. Gottlieb, E. Hart, M. Middendorf, and G. Raidl,
pp. 220230, Springer-Verlag, Kinsale, Ireland, April 2002.
[31] B. J. Lucier, S. Mamillapalli, and J. Palsberg, Program optimization for faster genetic programming, in Proc. 3rd Annual Conference on Genetic Programming (GP-98), J. R. Koza,
W. Banzhaf, K. Chellapilla, et al., Eds., pp. 202207, Morgan
Kaufmann, Madison, Wis, USA, July 1998.
[32] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge,
Mass, USA, 1992.
[33] J. R. Koza, Simultaneous discovery of reusable detectors
and subroutines using genetic programming, in Proc. 5th
International Conference on Genetic Algorithms, (ICGA 93),
S. Forrest, Ed., pp. 295302, Morgan Kaufmann, UrbanaChampaign, Ill, USA, 1993.
[34] D. Howard, S. C. Roberts, and C. Ryan, Evolution of an object detection ant for image analysis, in Genetic and Evolutionary Computation Conference Late Breaking Papers, E. D.
Goodman, Ed., pp. 168175, San Francisco, Calif, USA, July
2001.
[35] R. Poli, Genetic programming for image analysis, in Proc.
1st Annual Conference on Genetic Programming (GP-96), J. R.
Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, Eds., pp.
363368, MIT Press, Stanford, Calif, USA, July 1996.
[36] F. Lindblad, P. Nordin, and K. Wol, Evolving 3d model interpretation of images using graphics hardware, in Proc. 2002
Congress on Evolutionary Computation CEC2002, pp. 225
230, Honolulu, Hawaii, USA, May 2002.
[37] C. T. M. Graae, P. Nordin, and M. Nordahl, Stereoscopic vision for a humanoid robot using genetic programming, in
Proc. EvoWorkshops 2000, Real-World Applications of Evolutionary Computing, S. Cagnoni, R. Poli, G. D. Smith, et al.,
pp. 1221, Springer-Verlag, Edinburgh, Scotland, UK, April
2000.
[38] P. Nordin and W. Banzhaf, Programmatic compression of
images and sound, in Proc. 1st Annual Conference on Genetic
Programming (GP-96), J. R. Koza, D. E. Goldberg, D. B. Fogel,
and R. L. Riolo, Eds., pp. 345350, MIT Press, Stanford, Calif,
USA, July 1996.
[39] N. Rai, Pixel statistics in neural networks for domain independent object detection, Minor thesis, Department of Computer Science, Faculty of Applied Science, RMIT University,
2001.
[40] M. Fuchs, Crossover versus mutation: an empirical and theoretical case study, in Proc. 3rd Annual Conference on Genetic Programming (GP-98), J. R. Koza, W. Banzhaf, K. Chellapilla, et al., Eds., pp. 7885, Morgan Kaufmann, Madison,
Wis, USA, July 1998.
[41] K. Harries and P. Smith, Exploring alternative operators and
search strategies in genetic programming, in Proc. 2nd Annual Conference on Genetic Programming (GP-97), J. R. Koza,
K. Deb, M. Dorigo, et al., Eds., pp. 147155, Morgan Kaufmann, Stanford, Calif, USA, July 1997.
[42] P. Wilson, Development of genetic programming strategies
for use in the robocup domain, Tech. Rep., Department of
Computer Science, RMIT, 1998, Honours thesis.
859
Mengjie Zhang received a B.E. (mechanical engineering) and an M.E. (computer
applications) in 1989 and 1992 from the
Department of Mechanical and Electrical Engineering, Agricultural University of
Hebei, China, and a Ph.D. in computer
science from RMIT University, Melbourne,
Australia, in 2000. During 19921995, he
worked at the Artificial Intelligence Research Centre, Agricultural University of
Hebei, China. In 2000, he moved to Victoria University of Wellington, New Zealand. His research is focused on data mining, machine
learning, and computer vision, particularly genetic programming,
neural networks, and object detection. He is also interested in web
information extraction, and knowledge-based systems.
Victor B. Ciesielski received his B.S. and
M.S. degrees in 1972 and 1975, respectively,
from the University of Melbourne, Australia
and his Ph.D. degree in 1980 from Rutgers University, USA. He is currently Associate Professor at the School of Computer Science and Information Technology,
RMIT University, where he heads the Evolutionary Computation and Machine Learning Group. Dr. Ciesielskis research interests
include evolutionary computation, computer vision, data mining,
machine learning for robot soccer, and, in particular, genetic programming approaches to object detection and classification.
Peter Andreae received a B.E. (honours) in
electrical engineering from the University
of Canterbury, New Zealand, in 1977 and
a Ph.D. in artificial intelligence from MIT
in 1985. Since 1985, he has been teaching
computer science at Victoria University of
Wellington, New Zealand. His research interests are centered in the area of making
agents that can learn behaviour from experience, but he has also worked on a wide
range of topics, ranging from reconstructing vasculature from xrays, clustering algorithms, analysis of micro-array data, programming by demonstration, and software reuse.

342853

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

342853

Diunggah oleh

Hak Cipta:

Format Tersedia

EURASIP Journal on Applied Signal Processing

Genetic and Evolutionary

EURASIP Journal on Applied Signal Processing

Genetic and Evolutionary Computation

EURASIP Journal on Applied Signal Processing

Genetic and Evolutionary Computation

Copyright 2003 Hindawi Publishing Corporation. All rights reserved.

Senior Advisory Editor

A. Gorokhov, The Netherlands

Naohisa Ohta, Japan

A Domain-Independent Window Approach to Multiclass Object Detection Using Genetic

EURASIP Journal on Applied Signal Processing 2003:8, 731732

I was delighted when I was asked to write a foreword to this

a decade ago showed that simple GAs in common practice

EURASIP Journal on Applied Signal Processing

competitive or clustered (or niched) selection mechanisms.

David E. Goldberg is Jerry S. Dobrovolny

EURASIP Journal on Applied Signal Processing 2003:8, 733739

Darwinian evolution is probably the most intriguing and

ous other European institutions, to create a working group of

EURASIP Journal on Applied Signal Processing

What were the main secrets behind Darwinian evolution,

Table 1: Nature-to-computer mapping at the basis of EAs.

Part of the representation of a solution

Despite their dierences, most EAs have the following

Traditionally, in GAs, solutions are encoded as binary strings.

Because normally the user of a GA has no ideas as to what

2.2. Selection in GAs

In normal GAs, populations are not allowed to grow or

Figure 1: Three crossover operators for binary GAs: (a) one-point

So, individuals with an above-average quality ( fi > f) tend

ospring. Another important property is to also guarantee

EURASIP Journal on Applied Signal Processing

Figure 2: Bitwise mutation in binary GAs.

values of one or more genes in a chromosome. Mutation is

oi = pi + r pi pi ,

where r is a random number in the interval [0, 1] (see

min pi , pi , max pi , pi .

2.4. Other GEC paradigms

The syntax trees are manipulated by specialised forms of

THE PAPERS IN THIS SPECIAL ISSUE

EURASIP Journal on Applied Signal Processing

the other includes also transcendental functions. The results

The guest editors hope that the readership of the journal

EURASIP Journal on Applied Signal Processing 2003:8, 740747

Blind Search for Optimal Wiener Equalizers Using

Murilo Bellezoni Loiola

Leandro Nunes de Castro

Fernando Jose Von Zuben

Marcos Travassos Romano

The constant modulus (CM) criterion [1, 2, 3] is a broadly

(2) there is an intimate relationship between CM minima

Blind Search for Optimal Equalizers Using Immune Networks

ADAPTIVE CRITERIA: THEORETICAL BASIS

which is the well-known zero-forcing (ZF) condition.

Figure 1: Elements of a communication system.

cally simple and inherently stable structure. Its input-output

where w is the equalizer coecient vector of length L and

where d is the previously defined equalization delay. If this

The cost function presented in (4) has multiple minima,

EURASIP Journal on Applied Signal Processing

minima and some Wiener solutions (the best ones). This is

w(n + 1) = w(n) + dec y(n) y(n) x(n),

where dec[y(n)] is simply the slicer output.

oi = pi + r pi pi ,

min pi , pi , max pi , pi .

follows, where c is a mutated individual c,

Step 8. Update S j to store the new value Sj .