Editor-in-Chief
Marc Moonen, Belgium
Associate Editors
Kiyoharu Aizawa, Japan
Gonzalo Arce, USA
Jaakko Astola, Finland
Kenneth Barner, USA
Mauro Barni, Italy
Sankar Basu, USA
Jacob Benesty, Canada
Helmut Blcskei, Switzerland
Chong-Yung Chi, Taiwan
M. Reha Civanlar, Turkey
Tony Constantinides, UK
Luciano Costa, Brazil
Zhi Ding, USA
Petar M. Djuri, USA
Jean-Luc Dugelay, France
Tariq Durrani, UK
Touradj Ebrahimi, Switzerland
Sadaoki Furui, Japan
Moncef Gabbouj, Finland
Fulvio Gini, Italy
Contents
Foreword, David E. Goldberg
Volume 2003 (2003), Issue 8, Pages 731-732
Editorial, Riccardo Poli and Stefano Cagnoni
Volume 2003 (2003), Issue 8, Pages 733-739
Blind Search for Optimal Wiener Equalizers Using an Artificial Immune Network Model,
Romis Ribeiro de Faissol Attux, Murilo Bellezoni Loiola, Ricardo Suyama, Leandro Nunes de Castro,
Fernando Jos Von Zuben, and Joo Marcos Travassos Romano
Volume 2003 (2003), Issue 8, Pages 740-747
Evolutionary Computation for Sensor Planning: The Task Distribution Plan, Enrique Dunn
and Gustavo Olague
Volume 2003 (2003), Issue 8, Pages 748-756
An Evolutionary Approach for Joint Blind Multichannel Estimation and Order Detection,
Chen Fangjiong, Sam Kwong, and Wei Gang
Volume 2003 (2003), Issue 8, Pages 757-765
Application of Evolution Strategies to the Design of Tracking Filters with a Large Number of
Specifications, Jess Garca Herrero, Juan A. Besada Portas, Antonio Berlanga de Jess,
Jos M. Molina Lpez, Gonzalo de Miguel Vela, and Jos R. Casar Corredera
Volume 2003 (2003), Issue 8, Pages 766-779
Tuning Range Image Segmentation by Genetic Algorithm, Gianluca Pignalberi, Rita Cucchiara,
Luigi Cinque, and Stefano Levialdi
Volume 2003 (2003), Issue 8, Pages 780-790
Parameter Estimation of a Plucked String Synthesis Model Using a Genetic Algorithm with Perceptual
Fitness Calculation, Janne Riionheimo and Vesa Vlimki
Volume 2003 (2003), Issue 8, Pages 791-805
Optimization and Assessment of Wavelet Packet Decompositions with Evolutionary Computation,
Thomas Schell and Andreas Uhl
Volume 2003 (2003), Issue 8, Pages 806-813
On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition
Systems in Adverse Conditions, Sid-Ahmed Selouani and Douglas O'Shaughnessy
Volume 2003 (2003), Issue 8, Pages 814-823
Evolutionary Techniques for Image Processing a Large Dataset of Early Drosophila Gene Expression,
Alexander Spirov and David M. Holloway
Volume 2003 (2003), Issue 8, Pages 824-833
A Comparison of Evolutionary Algorithms for Tracking Time-Varying Recursive Systems,
Michael S. White and Stuart J. Flockton
Volume 2003 (2003), Issue 8, Pages 834-840
Foreword
David E. Goldberg
Department of General Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Email: deg@uiuc.edu
732
Editorial
Riccardo Poli
Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK
Email: rpoli@essex.ac.uk
Stefano Cagnoni
Department of Computer Engineering, University of Parma, 43100 Parma, Italy
Email: cagnoni@ce.unipr.it
1.
INTRODUCTION
734
2.
Computer
Individual
Population
Fitness
Chromosome
Solution to a problem
Set of solutions
Quality of a solution
Representation for a solution
(e.g., set of parameters)
Gene
Crossover
Mutation
Natural selection
Search operator
Search operator
Promoting the reuse of good (sub-)solutions
Representations
Editorial
735
Crossover point
Crossover point
101010 1010
101010 1110
111000 1110
111000 1010
Parents
Ospring
(a)
Crossover points
Crossover points
10 1010 1010
11 1010 1110
11 1000 1110
10 1000 1010
Parents
Ospring
(b)
1010101010
111010 1 110
1110001110
pi =
fi
j
fi
fj
= .
fN
(1)
Parents
Ospring
(c)
(2)
736
Mutation
site
101 0 101010
101 1 101010
Parent 2
Parameter 1
Parameter 2
Ospring
Parent 1
(3)
(a)
Parent 2
Parameter 1
Ospring
Parameter 2
Parent 1
Parameter 3
(b)
Mutated
individual
Parameter 1
Parameter 2
Random
displacement
(4)
Individual
Parameter 3
(c)
Mutation is often seen as the addition of a small random variation (e.g., Gaussian noise) to a point in a multidimensional
space (see Figure 3c).
Figure 3: (a), (b) crossover operators and (c) mutation for realvalued GAs.
Editorial
Artificial immune systems (see [7, Part III, Chapters 10
13] or [8] for an extensive introduction) are distributed computational systems inspired by biological immune systems,
which can recognise patterns and can remember previously
seen patterns in an ecient and eective way. These systems
are very close relatives of EAs (sometimes involving an evolutionary process in their inner mechanics) although they use
a dierent biological metaphor.
3.
In their paper entitled Blind search for optimal Wiener equalizers using an artificial immune network model, Attux et al.
exploit recent advances in the field of artificial immune systems to obtain optimum equalisers for noisy communication
channels, using a technology that does not require the availability of clean samples of the input signal. This approach is
very successful in a variety of test equalisation problems. The
approach is also compared with a more traditional EA, a GA
with niching, showing superior performance.
The paper by Dunn and Olague, entitled Evolutionary
computation for sensor planning, shows how well-designed
evolutionary computation techniques can solve the problem
of optimally specifying sensing tasks for a workcell provided
with multiple manipulators and cameras. The problem is
NP hard, eectively being a composition of a set partitioning problem and multiple traveling salesperson problems.
Nonetheless, thanks to clever representations and the use of
evolutionary search, this system is able to solve the problem,
providing solutions of quality very close to that of the solutions obtained via exhaustive search, but in a tiny fraction of
the time.
The paper entitled An evolution approach for joint blind
multichannel estimation and order detection by Fangjiong et
al. presents a method for the detection of the order and
the estimation of the parameters of a single-input multipleoutput channel. The method is based on a hybrid GA with
specially designed operators. The method shows performances comparable with existing closed-form approaches
which, however, are much more restricted in that they either
assume that the channel order is known or treat the problems
of order estimation and parameter estimation separately.
In Application of evolution strategies to the design of tracking filters with a large number of specifications, Herrero et
al. attack the problem of tracking civil aircrafts from radar
information within the extremely tight performance constraints imposed by a civil aviation standard. They use interactive multiple mode filters optimised by using an ES and
a multiobjective optimisation approach obtaining a highperformance aircraft tracker.
Making EAs more at hand and easy to apply for general
practitioners by self-tuning their parameters is one of the
main aims with which Pignalberi et al. developed GASE, a
GA-based tool for range image segmentation. The system,
along with some practical results, is described in the paper Tuning range image segmentation by genetic algorithm. A
multiobjective fitness function is adopted to take into consideration problems that are typically encountered in range
737
image segmentation.
The paper Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness
calculation describes the use of GAs to estimate the control parameters for a widely used plucked string synthesis
model. Using GAs, Riionheimo and Valimaki have been able
to automate parameter extraction, which had been formerly
achieved only through semiautomatic approaches, obtaining
comparable results, both in quantitative and in qualitative
terms. An interesting feature of the approach is the inclusion of knowledge about perceptual properties of the human
hearing system into the fitness function.
Schell and Uhl compare results obtained with a GA-based
approach to the near-best-basis (NBB) algorithm, a wellknown suboptimal algorithm for wavelet packet decomposition. In their paper Optimization and assessment of wavelet
packet decompositions with evolutionary computation, they
highlight the problem of finding good cost functions in terms
of correlation with actual image quality. They show that GAs
provide lower-cost solutions that, however, provide lowerquality images than NBB.
In the paper entitled On the use of evolutionary algorithms
to improve the robustness of continuous speech recognition
systems in adverse conditions, Selouani and OShaughnessy
show how a GA can tune a system based on state-of-the-art
speech recognition technology so as to maximise its recognition accuracy in the presence of severe noise. This hybrid
of evolution and conventional signal processing algorithms
amply outperforms nonadaptive systems. The EA used is a
GA with real-coded representation, rank selection, a heuristic type of crossover, and a nonuniform mutation operator.
The paper Evolutionary techniques for image processing a
large dataset of early Drosophila gene expression by Spirov and
Holloway describes an evolutionary approach to image processing to process confocal microscopy images of patterns of
activity for genes governing early Drosophila development.
The problem is approached using plain GAs, a simplex approach, and a hybrid between these two.
The use of GAs to track time-varying systems based on
recursive models is tackled in A comparison of evolutionary
algorithms for tracking time-varying recursive systems. The paper first compares a plain GA with a GA variant, called random immigrant strategy, showing that the latter performs
better in tracking time-varying systems even if it has problems with fast-varying systems. Finally, a hybrid combination
of GAs and local search that is able to tackle even such hard
tasks is proposed.
Zhang et al., in their paper A domain-independent window approach to multiclass object detection using genetic programming, propose an interesting approach in which GP is
used to both detect and localise features of interest. The approach is compared with a neural network classifier, used
as reference, showing that GP evolved programs can provide significantly lower false-alarm rates. Within the proposed approach, the choice of the primitive set is also discussed, comparing results obtained with two dierent sets:
one comprises only the four basic arithmetical operators, and
738
10.
4.
11.
CONCLUSIONS
12.
13.
14.
APPENDIX
POINTERS TO FURTHER READING IN GEC
1. David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,
Reading, Massachusetts, 1989. A classic book on genetic algorithms and classifier systems.
2. David E. Goldberg. The Design of Innovation: Lessons
from and for Competent Genetic Algorithms. Kluwer
Academic Publishers, Boston, 2002. An excellent, longawaited followup of Goldbergs first book.
3. Melanie Mitchell, An introduction to genetic algorithms,
A Bradford Book, MIT Press, Cambridge, MA, 1996. A
good introduction to genetic algorithms.
4. John H. Holland, Adaptation in Natural and Artificial
Systems, second edition, A Bradford Book, MIT Press,
Cambridge, MA, 1992. Second edition of a classic from
the inventor of genetic algorithms.
5. Thomas Back and Hans-Paul Schwefel. An overview
of evolutionary algorithms for parameter optimization. Evolutionary Computation, vol. 1, no. 1, pp. 123,
1993. A good introduction to parameter optimisation
using EAs.
6. T. Back, D. B. Fogel and T. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators,
Institute of Physics Publishing, 2000. A modern introduction to evolutionary algorithms. Good both for
novices and more expert readers.
7. John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT
Press, 1992. The bible of genetic programming by the
founder of the field. Followed by GP II (1994), GP III
(1999), and GP IV (forthcoming).
8. Wolfgang Banzhaf, Peter Nordin, Robert E. Keller and
Frank D. Francone, Genetic ProgrammingAn Introduction; On the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann, 1998.
An excellent textbook on GP.
9. W. B. Langdon and Riccardo Poli, Foundations of Genetic Programming, Springer, February 2002. The only
15.
16.
17.
book entirely devoted to the theory of GP and its relations with the GA theory.
Proceedings of the International Conference on Genetic Algorithms (ICGA). ICGA is the oldest conference on EAs.
Proceedings of the Genetic Programming Conference.
This was the first conference entirely devoted to GP.
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). Born in 1999 from the
recombination of ICGA and the GP conference mentioned above, GECCO is the largest conference in the
field.
Proceedings of the Foundations of Genetic Algorithms
(FOGA) Workshop. FOGA is a biannual, small but
very prestigious and highly selective workshop. It is
mainly devoted to the theoretical foundations of EAs.
Proceedings of the Congress on Evolutionary Computation (CEC). CEC is a large conference under the patronage of IEEE.
Proceedings of Parallel Problem Solving from Nature
(PPSN). This is a large biannual European conference,
probably the oldest of its kind in Europe.
Proceedings of the European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP). This is a small workshop, reaching
its fifth edition in 2003. It is the only event worldwide
uniquely devoted to the research topics covered by this
special issue.
Proceedings of the European Conference on Genetic
Programming. EuroGP was the first European event
entirely devoted to GP. Run as a workshop in 1998
and 1999, it became a conference in 2000. It has now
reached its sixth edition with EuroGP 2003 held at the
University of Essex. Currently, this is the largest event
worldwide solely devoted to GP.
ACKNOWLEDGMENTS
The guest editors would like to thank Professor David E.
Goldberg for his insightful foreword, the former and present
editors-in-chief of EURASIP JASP, Professor K. J. Ray Liu
and Professor Marc Moonen, for their support in putting together this special issue, and all the reviewers who have generously devoted their time to help ensure the highest possible
quality for the papers in this volume. All the authors of the
manuscripts who have contributed to this special issue are
also warmly thanked.
Riccardo Poli
Stefano Cagnoni
REFERENCES
[1] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich, USA, 1975.
[2] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge,
Mass, USA, 1992.
Editorial
[3] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, FrommannHolzboog, Stuttgart, Germany, 1973.
[4] H.-P. Schwefel, Numerical Optimization of Computer Models,
Wiley, Chichester, UK, 1981.
[5] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press,
Cambridge, Mass, USA, 1996.
[6] W. B. Langdon and R. Poli, Foundations of Genetic Programming, Springer-Verlag, New York, NY, USA, 2002.
[7] D. Corne, M. Dorigo, and F. Glover, Eds., New Ideas in Optimization, McGraw-Hill, London, UK, 1999.
[8] D. Dasgupta, Ed., Artificial Immune Systems and Their Applications, Springer-Verlag, New York, NY, USA, 1999.
Riccardo Poli received a Ph.D. degree in
bioengineering (1993) from the University
of Florence, Italy, where he worked on image analysis, genetic algorithms, and neural networks until 1994. From 1994 to 2001,
he was a lecturer and then a reader in
the School of Computer Science of the
University of Birmingham, UK. In 2001,
he became a Professor at the Department
of Computer Science of the University of
Essex, where he founded the Natural and Evolutionary Computation Group. Professor Poli has published around 130 papers on
evolutionary algorithms, genetic programming, neural networks,
and image analysis and signal processing, including the book Foundations of Genetic Programming (Springer, 2002). He has been
Cochair of EuroGP, the European Conference on GP, in 1998,
1999, 2000, and 2003. He was Chair of the GP theme at the Genetic and Evolutionary Computation Conference (GECCO) 2002
and Cochair of the Foundations of Genetic Algorithms Workshop
(FOGA) 2002. He will be General Chair of GECCO 2004. Professor Poli is an Associate Editor of Evolutionary Computation (MIT
Press) and Genetic Programming and Evolvable Machines (Kluwer),
a reviewer for 12 journals, and has been a programme committee
member of 40 international events.
Stefano Cagnoni has been an Assistant Professor in the Department of Computer Engineering of the University of Parma since
1997. He received the Ph.D. degree in bioengineering in 1993. In 1994, he was a
Visiting Scientist at the Whitaker College
Biomedical Imaging and Computation Laboratory at the Massachusetts Institute of
Technology. His main research interests are
in computer vision, evolutionary computation, and robotics. As a member of EvoNet, the European Network of Excellence in Evolutionary Computation, he has chaired
the EvoIASP working group on evolutionary computation in image analysis and signal processing, and the corresponding workshop since its first edition in 1999. He is a reviewer for several journals and a programme committee member of several international
events.
739
Ricardo Suyama
DSPCOM, DECOM, FEEC, State University of Campinas, C.P. 6101, Campinas, SP, Cep 13083-970, Brazil
Email: rsuyama@decom.fee.unicamp.br
1.
INTRODUCTION
The main goal of communications engineering is to provide adequate message interchange, through a certain channel, between a transmitter and a receiver. Nevertheless, the
channels introduce distortion in the transmitted message,
what usually leads to severe degradation. A device named
equalizer filters the received signal in order to recover the
desired information. Figure 1 depicts the schematic channel
and equalizer representation in a communication system, together with their respective input and output signals.
From Figure 1, it can be inferred that the main goal of the
equalizer is to obtain an output signal as similar as possible
to the transmitted signal, except for a gain K and a delay d,
that is,
y(n) = K s(n d),
(1)
741
s(n)
x(n)
Channel
y(n)
Equalizer
(2)
JW = E s(n d) y(n)
2
(3)
2
2
R2 y(n)
4
E s(n)
(4)
R2 =
2
.
E s(n)
(5)
742
(6)
Together with many other bodily systems, such as the nervous and the endocrine systems, the immune system plays a
major role in maintaining life. Its primary functions are to
defend the body against foreign invaders (e.g., viruses, bacteria, funguses, etc.) and to eliminate the malfunctioning self
cells and debris.
In [10], de Castro and Von Zuben proposed an artificial immune network model, named aiNet, inspired by the clonal
selection and network theories of the immune system. This
algorithm is demonstrated to be suitable to perform data
(2.4)
(2.5)
(2.6)
(2.7)
1
exp( f )
(7)
743
number of network individuals, named memory
cells, after suppression.
(2.8) Diversity introduction. Introduce a percentage
d% of randomly generated individuals and return to step (2).
(3) EndWhile.
The original stopping criterion proposed for the algorithm is based on the number of memory cells. After the
network interactions (step (2.7)), a certain number of individuals remain. If this number does not vary from one iteration to the other, then the network is said to have a stable
population size. In such condition, the remaining individuals
are all memory cells corresponding to local optima solutions.
However, in accordance with the classical modus operandi in
adaptive equalization, a maximum number of iterations was
adopted as the stopping criterion.
For a more computational description of the immune algorithm presented, the reader is invited to visit the
website http://www.cs.ukc.ac.uk/people/sta/jt6/aisbook/aisimplementations.htm, from where the original Matlab code
for the opt-aiNet and many other immune algorithms can
be downloaded.
3.3.
(8)
to be maximized.
Figure 2a depicts f (x, y) and an initial population of 13
individuals after the local search part of the algorithm was
completed for the first time (steps (2.1) to (2.6)). Note that
all the remaining 13 individuals are positioned in peaks of the
function. Figure 2b depicts the function to be optimized after
the convergence of the algorithm. In this case, nearly all peaks
744
f (x, y)
2.5
2
1.5
1
0.5
0
0.5
1
1
0.5
y
0
0.5
0.5
0.5
(a)
f (x, y)
2.5
2
1.5
1
0.5
0
0.5
1
1
0.5
1
y
0
0.5
0.5
0.5
0
x
(b)
SIMULATION RESULTS
In order to evaluate the performance of the opt-aiNet algorithm when applied to search for the optimal Wiener
equalizers, three dierent channels (C1, C2, and C3) were
745
Solution
Wopt
W2
W3
W4
W5
Value
5
0.35
10
50
1000
100
0.0917
8%
0.0918
2%
0.1022
2%
Solution
Wopt
W2
W3
0.1890
4%
0.1951
4%
Residual MSE
0.0071
0.0075
0.0104
Freq. (opt-aiNet)
66%
32%
2%
Residual MSE
0.0071
0.0075
Freq. (opt-aiNet)
84%
16%
(9)
1
1 + JCM
(10)
746
W3
W3
ACKNOWLEDGMENTS
W4
Close to
0.1882
0.0860]
0.1862
0.0781]
0.2622
0.5315]
0.2460
0.5218]
0.0991
0.0547]
0.1002
0.0527]
0.1518
0.0815]
0.1416
0.0835]
0.3541
0.0761]
0.3742
0.0508]
0.1777
0.0538]
0.1769
0.0601]
0.0967
0.0252]
Wopt
Wopt
W2
W2
W4
W6
W6
W5
W5
W7
This work started claiming that there is a strong relationship between the CM global optima and some of the Wiener
solutions so that such solutions can be attained by refining the CM minima using a simple DD technique. On the
other hand, the CM global optimum can be easily reached by
means of a blind search procedure, such as an EA. Therefore,
the combination of the CM criterion with an ecient global
search procedure gives rise to a framework to design optimal
Wiener filters. This is the core of our proposal.
Our approach uses an immune-based algorithm, named
opt-aiNet, to optimize the parameters of the equalizer, and
benchmarks its performance against those obtained by using a genetic algorithm with niching. Dierent channels and
REFERENCES
[1] D. N. Godard, Self-recovering equalization and carrier tracking in two-dimensional data communication systems, IEEE
Trans. Communications, vol. 28, no. 11, pp. 18671875, 1980.
[2] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle
River, NJ, USA, 3rd edition, 1996.
[3] C. R. Johnson, P. Schniter, T. J. Endres, J. D. Behm, D. R.
Brown, and R. A. Casas, Blind equalization using the constant modulus criterion: a review, Proceedings of the IEEE,
vol. 86, no. 10, pp. 19271950, 1998.
[4] H. Zeng, L. Tong, and C. R. Johnson, An analysis of constant
modulus receivers, IEEE Trans. Signal Processing, vol. 47, no.
11, pp. 29902999, 1999.
[5] L. N. de Castro and J. Timmis, An artificial immune network for multimodal function optimization, in Proc. IEEE
Congress of Evolutionary Computation (CEC 02), vol. 1, pp.
699704, Honolulu, Hawaii, USA, May 2002.
[6] A. M. Costa, R. R. F. Attux, and J. M. T. Romano, A new
method for blind channel identification with genetic algorithms, in Proc. IEEE International Telecommunications Symposium, Natal, Brazil, September 2002.
[7] L. N. de Castro and J. Timmis, Artificial Immune Systems:
a New Computational Intelligence Approach, Springer-Verlag,
London, UK, 2002.
[8] F. M. Burnet, The Clonal Selection Theory of Acquired Immunity, Cambridge University Press, Cambridge, UK, 1959.
[9] N. K. Jerne, Towards a network theory of the immune system, Ann. Immunol. (Inst. Pasteur), vol. 125C, pp. 373389,
1974.
[10] L. N. de Castro and F. J. Von Zuben, Data Mining: A Heuristic Approach, Chapter XII aiNet: an artificial immune network
for data analysis, pp. 231259, Idea Group Publishing, Hershey, Pa, USA, 2001.
[11] T. Back, D. B. Fogel, and Z. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators, Institute of Physics
Publishing (IOP), Bristol, UK, 2000.
[12] W. Atmar, Notes on the simulation of evolution, IEEE
Transactions on Neural Networks, vol. 5, no. 1, pp. 130148,
1994.
[13] J. H. Holland, Adaptation in Natural and Artificial Systems,
MIT Press, Cambridge, Mass, USA, 2nd edition, 1992.
747
Fernando Jose Von Zuben received his B.S.
degree in electrical engineering in 1991. In
1993, he received his M.S. degree and his
Ph.D. degree in 1996, both in automation,
from the Faculty of Electrical and Computer Engineering at the State University of
Campinas, SP, Brazil. Since 1997 he is an
Assistant Professor in the Department of
Computer Engineering and Industrial Automation, at the State University of Campinas, SP, Brazil. The main topics of his research are artificial neural networks, artificial immune systems, evolutionary algorithms,
nonlinear control systems, and multivariate data analysis. F. Von
Zuben is a member of IEEE, INNS, and AAAI.
Joao Marcos Travassos Romano was born
in Rio de Janeiro in 1960. He received the
B.S. and M.S. degrees in electrical engineering from the State University of Campinas
(Unicamp) in Brazil in 1981 and 1984, respectively. In 1987, he received the Ph.D. degree from University of Paris XI. In 1988, he
joined the Communications Department of
the Faculty of Electrical and Computer Engineering, Unicamp, where he is now a Professor. He served as an Invited Professor in the University Rene
Descartes in Paris during the winter of 1999, and in the Communications and Electronic Laboratory in CNAM, Paris, during the winter of 2002. He is the responsible of the Signal Processing for Communications Laboratory. His research interests concern adaptive
and intelligent signal processing and its applications in telecommunications problems like channel equalization and smart antennas.
Since 1988, he is a recipient of the Research Fellowship of CNPq,
Brazil. Professor Romano is a member of the IEEE Electronics and
Signal Processing Technical Committee and an IEEE senior member. From April 2000, he is the President of the Brazilian Communications Society (SBrT), a Sister Society of ComSoc, IEEE.
Gustavo Olague
Departamento de Ciencias de la Computacion, Division de Fsica Aplicada, Centro de Investigacion Cientfica y
de Educacion Superior de Ensenada, 22860 Ensenada, BC, Mexico
Email: olague@cicese.mx
Received 29 June 2002 and in revised form 29 November 2002
Autonomous sensor planning is a problem of interest to scientists in the fields of computer vision, robotics, and photogrammetry. In automated visual tasks, a sensing planner must make complex and critical decisions involving sensor placement and the
sensing task specification. This paper addresses the problem of specifying sensing tasks for a multiple manipulator workcell given
an optimal sensor placement configuration. The problem is conceptually divided in two dierent phases: activity assignment and
tour planning. To solve such problems, an optimization methodology based on evolutionary computation is developed. Operational limitations originated from the workcell configuration are considered using specialized heuristics as well as a floating-point
representation based on the random keys approach. Experiments and performance results are presented.
Keywords and phrases: sensor planning, evolutionary computing, combinatorial optimization, random keys.
1.
INTRODUCTION
749
PROBLEM STATEMENT
750
Definition 1 (Photogrammetric network). A photogrammetric network is represented as an ordered set V of n threedimensional viewpoints. Each individual viewpoint is expressed as V j , where j ranges from j = 1 to n.
Definition 2 (Robot workcell). A multirobot active vision
system is represented by an ordered set R consisting of r
robots in the workcell. Each individual robot is expressed by
Ri , where i ranges from i = 1 to r.
Definition 3 (Operational environment). Each robot has an
operational restricted physical space denoted by Oi , where i
ranges from i = 1 to r.
Accordingly, the problem statement can be expressed as
follows.
Definition 4 (Task distribution problem). Find a set of r ordered subsets Xi V, where V = {ri=1 Xi | V j Xi , V j
Oi } such that the total length traveled by the robots is minimized.
From the above definitions, the activity assignment problem relates each of the n elements of V with one of the
r possible elements of R. Considering that each robot Ri
has assigned ni viewpoints, a problem of sequencing the
viewpoints emerges, which we call tour planning. Our goal is
to find the best combination of activity assignment and tour
planning in order to optimize the overall operational cost
of the task distribution. This total operational cost is produced by adding individual tour costs, Qi , defined by the Euclidean distance that each robot needs to travel in straight
lines among the dierent
viewpoints. Hence, the criterion
is represented as QT = ri=1 Qi . Such a problem statement
yields a combinatorial problem which is computationally
NP-hard and requires the use of special heuristics in order
to avoid an exhaustive search.
3.
Figure 4: Operational restrictions. The workcell configuration imposes accessibility restrictions. Hence, when a robot reach is limited,
it is possible to reduce the search space for the activity assignment
phase.
Table 1: Structure ACCESSIBILITY containing the number and the
list of robots capable of reaching a particular viewpoint.
Viewpoint ID
Number of robots
V1
..
.
Vn
r1
..
.
rn
RobID1 , . . . , RobIDr1
..
.
RobID1 , . . . , RobIDrn
Solution representation
S1
S = 0.41
S3
0.51 0.15
Sn
0.84
Table 2: Structure TASKS containing the list of viewpoints comprising each robot tour Ti .
0.18
(1)
The smallest value in S is found at the fourth position, denoted by S4 . Therefore, V4 is the first viewpoint in the resulting permutation P. The second smallest value is found in the
third position S3 , making V3 the second viewpoint in P, and
so on. The resulting permutation of the five viewpoints is
P = V4 , V3 , V5 , V2 , V1 .
751
(2)
v1
..
.
vr
subsection presents the heuristics used by our method to obtain such task distribution description.
3.3.
Search heuristics
A solution representation S needs to be evaluated. Such evaluation is applied to the task distribution description contained in TASKS. Hence, a mapping M : S TASKS is
necessary. The mapping M assigns and sequences the viewpoints among the dierent robots and stores the results in
the structure TASKS. The mapping M makes use of the solution representation data structures S and TASKS, as well as
the precomputed operational restrictions stored in ACCESSIBILITY. The two distinct phases of activity assignment and
tour planning are presented separately.
3.3.1
Activity assignment
The activity assignment problem allocates each of the viewpoints V j to one of the possible robots. The goal is to provide
an initial unsequenced set of individual robot tours Ti using
the following steps.
Step 1. Obtain the r j number of robots capable of reaching
that particular viewpoint by consulting the ACCESSIBILITY structure, see Table 1.
Step 2. Divide the interval (0, 1) into r j equally distributed
segments in order to determine the size of a comparison segment Seg = 1/r j .
Step 3. Calculate in which k segment the random value S j resides, that is, k = Int(S j / Seg) + 1.
Step 4. Assign the viewpoint V j to the kth robot in the corresponding entry in the ACCESSIBILITY structure. In
this way, the assigned robot index i is given by RobIDk ,
which is found on the entry that corresponds to V j inside the ACCESSIBILITY table.
Step 5. Append V j to the list of viewpoints, Ti assigned to
the ith robot. The tour description Ti is stored in the
TASKS structure.
A graphical description of these heuristic steps is shown
in Figure 6. The series of actions performed in the activity
assignment phase are based on the compliance with operational restrictions, and in doing so, assure that any codified
string S brings a valid solution to the assignment problem.
Based on such strategy, each possible codification string S has
only one possible interpretation. After executing this series of
steps, each viewpoint is assigned to a robot. The viewpoints
assigned to a single robot Ri are grouped into a set Ti . Each
752
S=
S1
S2
S3
0.41
0.23
0.15
0.79
0.42
0.96
0.64
S2
S1
Sn
S3
0.18
0.79
Sn
0.42 0.96
0.64
0.18
S1 = 0.41
ACCESSIBILITY
1/3
Viewpoint
Number of
robots
List of
robots
V1
R1 , R3 , R4
Vn
rn
RobID1 , . . . , RobIDrn
2/3
1/3
2/3
Mapping
S1 = 0.24
4
k=2
2
0
1/3
2/3
Robot ID
No. of views
List of viewpoints
R1
.
.
.
3
.
.
.
T1 = [V1 , V3 , V8 ]
.
.
.
Rm
rm
Tm = [ViewID1 , . . . , ViewIDrm ]
S=
S1
S2
S3
0.24
0.73
0.04
0.34
0.77
S8
0.69
0.27
0.46
No. of views
List of viewpoints
R1
T1 = [V3 , V1 , V8 ]
Figure 8: Tour planning. The smallest-value-first heuristic is applied to each robot tour considering the previously adjusted values
in S.
753
(3)
Robot = R3 , R3 , R1 , R1 , R4 , R4 , R2 , R2 .
(4)
Figure 9: Eight viewpoints are to be distributed among four manipulators. Viewpoints are depicted as individual cameras and solid
lines connected such cameras illustrate each robot tour corresponding to an optimal task distribution.
Number of robots
V1
V2
r1 = 3
r2 = 3
R1 R2 R3
R1 R2 R3
V3
V4
V5
r3 = 3
r4 = 3
r5 = 3
R1 R3 R4
R1 R3 R4
R1 R2 R4
V6
V7
r6 = 3
r7 = 3
R1 R2 R4
R2 R3 R4
V8
r8 = 3
R2 R3 R4
(5)
Experiment B
754
Number of viewpoints
T1 = [V4 V3 ]
2
3
2
2
T2 = [V8 V7 ]
T3 = [V2 V1 ]
T4 = [V6 V5 ]
Figure 11: Best solution found by the genetic algorithm for the configuration shown in Figure 10.
photogrammetric network consisting of 13 cameras in an optimal manner, see Figure 10. Working with this fixed configuration, we executed several tests. First, to test our methods
functionality, we executed the task distribution planner. Several possible solutions are obtained over the course of multiple executions, two of such solutions are depicted in Figures
11 and 12. Notice that the best solution found, represented
in Figure 11, does not incorporate all of the available robots.
Figure 12 shows a more typical solution which is also found
by our system.
In order to test the methods adaptability, two of the four
manipulator robots were disabled. This additional restriction
is reflected only on changes to the values stored in Table 5.
Viewpoint ID
Number of robots
V1
V2
V3
r1 = 2
r2 = 2
r3 = 2
R2 , R4
R2 , R3
R1 , R4
V4
V5
r4 = 2
r5 = 2
R1 , R4
R1 , R4
V6
V7
V8
r6 = 2
r7 = 2
r8 = 2
R2 , R3
R2 , R4
R2 , R3
V9
V10
r9 = 2
r10 = 2
R1 , R3
R1 , R3
V11
V12
V13
r11 = 3
r12 = 3
r13 = 3
R1 , R 2 , R 3
R1 , R 2 , R 4
R1 , R 2 , R 4
The system is expected to distribute tasks among the two remaining robots. Results from such tests are shown in Figures
13 and 14. In these cases the activity assignment problem becomes visually more simple to resolve, but the diculty of
the tour planning problem becomes more evident since each
tour will consist of more viewpoints.
Since our approach is based on EC techniques, the determination of the task distribution plan is the product of
the evolution process over a population of possible solutions.
Therefore, fitness values of each of these individuals, and of
the population in general, reflect the eect of such evolution. In this way, the population fitness values evolve over
the course of several generations until an optimal solution
is found, see Figure 15. The stepwise decrements in the best
fitness line point out the combinatorial aspect of our search,
while the average fitness confirms the positive eect of the
evolution process.
While great detail has been given to the special heuristics
used in our approach, the behavior of the curves presented in
755
1960
1920
1840
1800
Figure 14: An environment similar to Figure 13 showing the systems flexibility to changes in the workcell configuration.
Exhaustive
search
1760
1720
Figure 13: Solution found by the system for the case where a pair
of robots were disabled from the configuration shown in Figure 10.
Gready
search
1880
10
20
30
Execution number
40
50
Figure 16: Genetic algorithm performance over multiple executions. The obtained solutions are always better than a greedy search,
reaching the global optima 14 out of 50 times.
5500
5.
5000
The development of an eective sensor planner for automated vision tasks implies the consideration of operational
restrictions as well as the vision tasks objectives. This work
presents a solution for the task distribution problem inherent to multiple robot workcells. The problem is conceptualized as two separate combinatorial problems: activity assignment and tour planning. A genetic algorithm-based strategy
that concurrently solves these problems was presented along
with experimental results. The approach employs auxiliary
data structures in order to incorporate accessibility limitations and to specify a task distribution plan. The evolutionary
nature of the optimization method allows for multiple approximate solutions of the optimization problem to be found
over the course of several executions. Performance considerations support the use of the proposed methodology compared to a greedy heuristic or an exhaustive search.
Future work can consider the robot motion planning
problem presented when there are obstacles in the environment or when the manipulator can collide with each other.
Also, the representation scheme can be modified to use two
values instead of adjusting the original representation string
by heuristic means. Furthermore, the genetic operators can
4500
4000
Worse
fitness
3500
3000
Average
fitness
2500
2000
Best
fitness
1500
1000
20
40
60
Generation
80
100
120
756
be modified in search of improving the evolutionary algorithm performance. Also, a rigorous analysis of the properties of the heuristics used is needed. At present, we are working toward a real implementation of our algorithms for intelligent sensor planning.
ACKNOWLEDGMENTS
This research was founded by Contract 35267-A from
CONACyT and under the LAFMI Project. The first author
was supported by scholarship 142987 from CONACyT. Figures 1, 2, 3, 4, 9, 10, 11, 12, 13, and 14 were generated with
software written at the Geometry Center. The authors thank
the anonymous reviewers for their suggestions which greatly
helped improve this paper.
REFERENCES
[1] K. A. Tarabanis, P. K. Allen, and R. Y. Tsai, A survey of sensor
planning in computer vision, IEEE Transactions on Robotics
and Automation, vol. 11, no. 1, pp. 86104, 1995.
[2] J. Miura and K. Ikeuchi, Task-oriented generation of visual
sensing strategies in assembly tasks, IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 20, no. 2, pp. 126138,
1998.
[3] G. Olague and R. Mohr, Optimal camera placement for accurate reconstruction, Pattern Recognition, vol. 35, no. 4, pp.
927944, 2002.
[4] T. S. Newman and A. K. Jain, A survey of automated visual
inspection, Computer Vision and Image Understanding, vol.
61, no. 2, pp. 231262, 1995.
[5] G. Olague, Planification du placement de cameras pour des
mesures 3D de precision, Ph.D. thesis, Institut National Polytechnique de Grenoble, France, October 1998.
[6] G. Olague and E. Dunn, Multiple robot task distribution:
Towards an autonomous photogrammetric system, in Proc.
IEEE Systems, Man and Cybernetics Conference, vol. 5, pp.
32353240, Tucson, Ariz, USA, October 2001.
[7] S. Sakane, R. Niepold, T. Sato, and Y. Shirai, Illumination
setup planning for a hand-eye system based on an environmental model, Advanced Robotics, vol. 6, no. 4, pp. 461482,
1992.
[8] S. Abrams, P. K. Allen, and K. A. Tarabanis, Dynamic sensor
planning, in Proc. IEEE International Conf. on Robotics and
Automation, Atlanta, Ga, USA, May 1993.
[9] B. Triggs and C. Laugier, Automatic task planning for robot
vision, in Proc. Int. Symp. Robotics Research, Munich, October
1995.
[10] P. Whaite and F. P. Ferrie, Autonomous exploration: Driven
by uncertainty, IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 19, no. 3, pp. 193205, 1997.
[11] R. Pito, A solution to the next best view problem for automated surface acquisition, IEEE Trans. on Pattern Analysis
and Machine Intelligence, vol. 21, no. 10, pp. 10161030, 1999.
[12] E. Marchand and F. Chaumette, Active vision for complete
scene reconstruction and exploration, IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 21, no. 1, pp. 6572,
1999.
[13] Y. Ye and J. K. Tsotsos, Sensor planning for 3D object search,
Computer Vision and Image Understanding, vol. 73, no. 2, pp.
145168, 1999.
[14] G. Olague, Automated photogrammetric network design using genetic algorithms, Photogrammetric Engineering & Remote Sensing, vol. 68, no. 5, pp. 423431, 2002, Paper awarded
Sam Kwong
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Email: cssamk@cityu.edu.hk
Wei Gang
Department of Electronic Engineering, South China University of Technology, Wushan, Guangzhou 510641, China
Email: ecgwei@scut.edu.cn
Received 30 May 2001 and in revised form 28 January 2003
A joint blind order-detection and parameter-estimation algorithm for a single-input multiple-output (SIMO) channel is presented. Based on the subspace decomposition of the channel output, an objective function including channel order and channel
parameters is proposed. The problem is resolved by using a specifically designed genetic algorithm (GA). In the proposed GA,
we encode both the channel order and parameters into a single chromosome, so they can be estimated simultaneously. Novel GA
operators and convergence criteria are used to guarantee correct and high convergence speed. Simulation results show that the
proposed GA achieves satisfactory convergence speed and performance.
Keywords and phrases: genetic algorithms, SIMO, blind signal identification.
1.
INTRODUCTION
Many applications in signal processing encounter the problem of blind multichannel identification. Traditional methods of such identification usually apply higher-order statistics techniques. The major problems of these methods are
slow convergence and many local optima [1]. Since the original work of Tong et al. [1, 2], many lower-order statisticsbased methods have been proposed for blind multichannel
identification (see [3] and references therein). A common
assumption in these methods is that the channel order is
known in advance. However, such information is, in fact,
not available. Thus, we are obliged to estimate the channel
order beforehand. Though many order-detection algorithms
can be applied (e.g., see [4]) to solve this particular problem,
the approaches that separate order detection and parameter
estimation may not be ecient, especially when the channelimpulse response has small head and tail taps [5].
To tackle this drawback, a class of channel-estimation algorithms performing joint order detection and parameter estimation has been proposed [5, 6]. In [5], a cost function in-
758
PROBLEM FORMULATION
We consider a multichannel FIR system with M subchannels. The transmitted discrete signal s(n) is modulated, filtered, and transmitted over these Gaussian subchannels. The
received signals are filtered and down-band converted. The
resulting baseband signal at the mth sensor can be expressed
as follows [1]:
xm (n) =
hm (k)s(n k) + bm (n),
m = 1, . . . , M,
(1)
k=0
where bm (n) denotes the additive Gaussian noise and is assumed to be uncorrelated with the input signal s(n), hm (n) is
the equivalent discrete channel-impulse response associated
with the mth sensor, and L is the largest order of these subchannels (note that the subchannels may have dierent orders). Equation (1) can be represented in vector-matrix formulation as follows:
xm (n) = Hm s(n) + bm (n),
m = 1, . . . , M,
(2)
where H = [HT1 HTM ]T is the M(N +1)(N +L+1) overall system transfer matrix and b(n) = [bT1 (n) bTM (n)]T
is the M(N + 1) 1 additive noise vector.
If we define the output-autocorrelation matrix as Rxx =
E[x(n)x(n)T ], then we have
Rxx = HRss HT + Rbb ,
for i = 1, . . . , N + L + 1,
n2
(3)
bm (n) = bm (n) bm (n 1) bm (n N)
(4)
(5)
hm,0 hm,L 0
..
..
..
..
.
.
.
.
0 hm,0 hm,L
.
Hm =
..
Rxx = U U = Us Un
(7)
Us Un
(10)
(6)
(9)
where
xm (n) = xm (n) xm (n 1) xm (n N)
(8)
(11)
J(h) = HH Un .
(12)
T
= 0 hm,0 hm,L ,
T
T
0 = hm,0 hm,L 0 .
h1m = 0 hTm
h2m
hTm
(13)
(14)
h1 = h11
h2 = h21
h1M
h2M
T T
T
(15)
759
value. Therefore, such a constraint is also helpful in making l
converge to a smaller value.
To ensure condition (2), we suggest imposing a penalty
on J(l, h) when a larger estimate of channel order is achieved.
Practically, the objective value (J(l, h)) converges to a small
value rather than exact zero. Therefore, we apply the multiplication instead of addition. The following objective function is proposed:
J(l, h) = lK UH
nH ,
(16)
GENETIC ALGORITHM
h1 h2 hT
(17)
760
cumij (l).
(18)
The above fitness function is not used in tournament selection but only in the convergence criteria of order chromosomes.
Parent selection
A good parent selection mechanism gives better parents a
better chance to reproduce. In the proposed GA, we employ
an elitist method [8] and tournament selection [11]. First,
partial chromosomes of the present population, that is, the
Q best chromosomes, are directly selected. Then, the other
(1 ) Q child chromosomes are generated via tournament
selection within the whole parent population. That is, two
chromosomes are randomly selected from the parents population in each cycle. The one with the smaller objective value
is selected.
Crossover
Crossover combines the feature of two parent chromosomes
to form two child chromosomes. Generally, the parent chromosomes are mated randomly [12]. In the proposed GA,
each chromosome contains two parts with dierent coding
technique. The order chromosome will decide how many elements in the parameter chromosome are used to calculate
the objective value. Therefore, these two parts cannot be decoupled. The conventional methods that perform crossover
separately may not be ecient. Normally, the order chromosomes will be short. For instance, an order chromosome with
a length of 5 implies a searching space from 1 to 32, which
covers most practical cases of the FIR channels. Therefore,
the order chromosomes are expected to converge much faster
than the parameter chromosomes. We propose not to perform crossover on the order chromosomes but to use mutation only. For the parameter chromosomes, crossover between chromosomes with dierent order is more explorative
(i.e., searches more data space). However, it may also damage the building blocks in the parent chromosomes. On the
other hand, crossover between chromosomes with the same
order is more exploitative (i.e., it speeds up convergence).
However it may cause premature convergence. Since faster
convergence is preferable in blind channel identification, we
propose to mate chromosomes of the same order. For each
estimated order, if the number of corresponding chromosomes is odd, a randomly selected chromosome is added to
the mating pool.
Assume that the chromosomes are mated and a pair of
them is given as
i
i
(c, h)ij = c1 c2 cS , h1 h2 hT j ,
(19)
(c, h)ik = c1 c2 cS , h1 h2 hT k .
i
i
i
hi+1
j = h1, j ha1 , j , a1 +1 ha1 +1, j
i
i
i
hi+1
k = h1,k ha1 ,k , a1 +1 ha1 +1,k
(20)
(21)
hi+1
j = h1 , . . . , ha3 , ha3 +1 + a3 +1 /P, . . . , ha4
+ a4 /P, . . . , ha4 +1 , . . . , hT ,
(22)
761
48
16
Penalty scale
1/12
pm
0.5
10.2m/100
Convergence criterion
We propose a dierent convergence criterion for order chromosomes and parameter chromosomes. The order chromosomes are considered to be converged if the gene pool is dominated by a certain order, that is,
cumij
D
cumij (l)
cumij
D
J(c, h)s1
30
0.1
0.1
( + 1)/( 1)
( 1)/( + 1)
(23)
other orders
(24)
where e is also a predefined ratio. Theoretically, the objective function in (16) has multiple minima that may have
overestimated orders. In order to cause the order chromosomes to converge on the correct channel order, we impose a
penalty on the chromosomes with greater order. Due to the
random nature of a GA, though in most cases the order
chromosomes can converge on the real channel order (see
the simulation result in Table 1), there is no guarantee that
the chromosomes will absolutely converge on the real channel order. Therefore, we propose to examine the converged
result to ensure correct convergence. If we let (c, h)s1 be the
current converged result, the examination can be carried out
as follows (see the outer loop in Figure 2): reduce the order of (c, h)s1 by 1, fix the order, and run the proposed GA
again (note that this time the order chromosomes are fixed,
i.e., pm = 0). After a few generations, a new result denoted
as (c, h)s2 can be achieved. If the objective values of (c, h)s1
and (c, h)s2 , that is, J(c, h)s1 and J(c, h)s2 , are close enough,
then we can decide that J(c, h)s1 has overestimated order and
J(c, h)s2
(25)
762
Start
Configure the proposed GA
according to Table 1
Initialize the chromosomes
Reinitialize the
parameter
chromosomes
No
Yes
Set Pm = 0
No
Inner loop
Yes
Store the converged result
No
Outer loop
Yes
Terminate
The overall flow diagram of the proposed approach is illustrated in Figure 2. It can be seen that the proposed GA has
an inner and an outer loop. The criteria in (23) and (24) in
the inner loop guarantee that a global optimum is achieved.
We have shown that this solution may have an overestimated
order. The criterion in (25) in the outer loop is used to reexamine the solution reached and guarantee the correct estimate.
It is important to note that although the order part and
the parameter part have a distinct representation, fitness
function, and convergence criterion, we encode the two parts
into a single chromosome rather than keeping two separate
chromosomes. This is because the order part decides how
many genes of the parameter chromosome should be used to
calculate the objective value and, therefore, these two parts
cannot be decoupled.
4.
EXPERIMENTAL RESULT
(26)
6
21
35%
7
11
18.3%
8
2
3.3
Total
60
100%
763
Average order of
the population
200
300
400
500
600
700
800
600
700
800
101
100
101
102
103
104
0
100
200
300
400
500
Generations
101
100
Order = 6
101
J(c, h)
order 8 represents a real channel of order 16, which covers most normal channels. Note that order chromosomes of
length 3 can also map the searching space from 9 to 16. So,
in case no satisfactory solution is reached, one may remap
the order searching space (916) and rerun the algorithm.
A large mutation rate (pc = 0.5) is adopted to prevent premature convergence. To speed up the convergence of parameter chromosomes, we adjust P every 100 generations (see
Table 2), where a denotes the floor value of a.
A 25-dB Gaussian white noise is added to the output and
2,000 output samples are used to estimate the autocorrelation matrix Rxx . Figure 3 shows a typical evolution curve. In
each generation, the average objective value and estimated
order of the whole population are plotted. From Figure 3,
one can see that the order chromosomes converge much
faster than the parameter chromosomes. They converge on
the true channel order in the first inner loop run (order = 5
in Figure 3). We store this converged result, reduce the order
by 1, set pm = 0, and then begin another GA execution. After
the convergence (order = 4 in Figure 3), we evaluate these
two converged results (order = 5 and order = 4 in Figure 3)
by using the outer loop criterion in (25). Since there is an exponential drop between the two results, the condition in (25)
is satisfied. Thus, our algorithm stops and concludes that order 5 is the final estimate.
The channel order is estimated by detecting the drop between two converged objective values, which may be similar to the traditional method where the eigenvalues of an
overmodeled covariance matrix are calculated and the channel order is determined when there is a significant drop between two adjoining eigenvalues [4]. However, our algorithm
is more ecient since the calculation of eigenvalue decomposition can be avoided and it can be seen that the drop is much
more significant (an exponential drop).
Figure 4 shows an evolution curve where the channel order is overestimated in the first inner loop run (order = 6
in Figure 4). In Figure 4, the objective values of the first two
converged results are quite close, which does not satisfy the
criterion set in (25). Further examination is thus required.
As above, we can get the third converged result (order = 4 in
Figure 4). By evaluating it with (25), we can draw the same
conclusion as from Figure 3.
When compared with existing work, the convergence
speed of the proposed GA is satisfactory since it can be seen
that a quite reliable solution can be reached in about 1,000
generations, whereas the algorithm in [9] converges after
2,000 generations (note that in [9] the channel order is assumed to be known). In [8], an identification problem with
similar complexity is simulated. The algorithm converges after hundreds of generations, but it is nonblind and, there-
100
Order = 5
102
Order = 4
103
104
200
400
600
Generations
800
1000
1200
764
101
RMSE
102
103
10
CONCLUSIONS
15
20
25
30
SNR (dB)
SS-SVD
SS-GA
h
Nt i=1
(27)
ACKNOWLEDGMENTS
The authors would like to express their appreciation to the
Editor-in-Charge, Prof Riccardo Poli, of this manuscript for
his eort in improving the quality and readability of this paper. This work is done when Dr. Chen was visiting the City
University of Hong Kong and his work is supported by City
University Research Grant 7001416 and the Doctoral Program fund of China under Grant 20010561007.
REFERENCES
[1] L. Tong, G. Xu, and T. Kailath, Blind identification and
equalization based on second-order statistics: a time domain
approach, IEEE Transaction on Information Theory, vol. 40,
no. 2, pp. 240349, 1994.
[2] L. Tong, G. Xu, Hassibi, B., and T. Kailath, Blind channel
identification based on second-order statistics: a frequencydomain approach, IEEE Transactions on Information Theory,
vol. 41, no. 1, pp. 329334, 1995.
[3] L. Tong and S. Perreau, Multichannel blind identification:
from subspace to maximum likelihood methods, Proceedings
of the IEEE, vol. 86, no. 10, pp. 19511968, 1998.
[4] A. P. Liavas, P. A. Regalia, and J.-P. Delmas, Blind channel
approximation: eective channel order determination, IEEE
Trans. Signal Processing, vol. 47, no. 12, pp. 33363344, 1999.
[5] L. Tong and Q. Zhao, Joint order detection and blind channel
estimation by least squares smoothing, IEEE Trans. Signal
Processing, vol. 47, no. 9, pp. 23452355, 1999.
[6] J. Ayadi and D. T. M. Slock, Blind channel estimation and
joint order detection by MMSE ZF equalization, in Proc.
IEEE 50th Vehicular Technology Conference (VTC 99), vol. 1,
pp. 461465, Amsterdam, The Netherlands, September 1999.
[7] L. Yong, H. Chongzhao, and D. Yingnong, Nonlinear system
identification with genetic algorithms, in Proc. 3rd Chinese
World Congress on Intelligent Control and Intelligent Automation (WCICA 00), vol. 1, pp. 597601, Hefei, China, JuneJuly
2000.
[8] L. Yao and W. A. Sethares, Nonlinear parameter estimation
via the genetic algorithm, IEEE Trans. Signal Processing, vol.
42, no. 4, pp. 927935, 1994.
[9] S. Chen, Y. Wu, and S. McLaughlin, Genetic algorithm optimization for blind channel identification with higher order
cumulant fitting, IEEE Transaction on Evolutionary Computation, vol. 1, no. 4, pp. 259265, 1997.
[10] E. Moulines, P. Duhamel, Cardoso, J.-F., and Mayrargue, S.,
Subspace methods for blind identification of multichannel
FIR filters, IEEE Trans. Signal Processing, vol. 43, no. 2, pp.
516525, 1995.
765
1.
INTRODUCTION
A tracking filter has the double goal of reducing measurement noise and consistently predicting future values of signal. This kind of problems has ecient solutions in the case
of stationary signals, but solutions for nonstationary problems are not so consolidated yet. This is the case in the field
we are dealing with in this paper, tracking aircraft trajectories
from radar measurements in air trac control (ATC) applications.
767
we explain the proposed optimization method based on
ES. Finally, Sections 5 and 6 are aimed at discussing optimization results and characteristics of solutions minimizing the fitness function, and summarizing the main conclusions.
2.
768
Out of all possible combinations, ARTAS has carried
out a choice containing the most important and realistically
worst cases. It comprises a number of simple input scenarios on which the nominal track quality requirements are defined. The methodology specified for this evaluation is based
on Monte Carlo simulation with the input parameters (radar
and trajectory parameters) particularized for each scenario.
The trajectories in dierent scenarios vary in the following
features:
(i) orientation with respect to the radar (radial or tangential starting courses, starting at a short, medium, or
maximum range);
(ii) sequence of dierent modes of flight (uniform, turns,
and longitudinal accelerations);
(iii) values of accelerations (upper and lower limits);
(iv) values of speeds (upper and lower limits).
There are eight specified simple scenarios with uniform
motion, and twelve complex scenarios including initialization with uniform motion, transition to transversal maneuver, and a second transition to come back to uniform motion.
When the target is far enough from the radar, a pure radial
approach to the radar leads to the worst case for transversal and heading errors during maneuver transitions, since azimuth error (much higher than radial error) is projected over
these components. With a similar reasoning, a pure tangential approach is the worst case for longitudinal and groundspeed errors during maneuvers. So, the scenarios basically
contain these two types of situations, varying in distance, velocities, and acceleration magnitudes. The authors have considered a couple of scenarios with longitudinal maneuvers although ARTAS does not specify performance for that type of
situations. The reason for this is that these operations appear in civil operations (especially in the TMAs) and the
filter is conceived to operate in real conditions. Otherwise,
the resulting tracking filter could be overfitted to transversal maneuvers, but developing undesirable systematic errors
with longitudinal maneuvers. The specifications for longitudinal scenarios were obtained extrapolating the ARTAS relations for the new input conditions. The resulting 22 scenarios, to be taken into account in the design of tracking filter are shown in Figure 1 (a circle represents radar position
and a square the initial position of target trajectory). Since
the specifications depend tightly on the input conditions,
there is no a priori worst case scenario whose attainment
would guarantee all cases, but all of them have to be considered simultaneously in the design process. It must be taken
into account that the design of tracker will be done considering that all requirements will be met without intermediate adaptation of the tracker parameters once the tracker
has been tuned for the typical radar characteristics and controlled volume (in this case, enroute area). The design will
provide a single set of parameters that would allow the filter to accomplish all the specifications in all the scenarios
considered.
For each of these scenarios, the performance of the
tracker should approach listed performance goal values un-
150 m/s
769
35 NM
150 m/s
15 NM
300 m/s
150 m/s
80 NM
65 NM
215 NM
50 NM
50 NM
15 NM
150 m/s
300 m/s
35 NM
150 m/s
230 NM
200 NM
6
a = 2.5 m/s2
a = 2.5 m/s2
150 m/s
300 m/s
65 NM
80 NM
200 NM
10
a = 2.5 m/s2
a = 2.5 m/s2
150 m/s
300 m/s
150 m/s
300 m/s
150 m/s
215 NM
215 NM
65 NM
80 NM
215 NM
11
12
300 m/s
300 m/s
150 m/s
50 NM
a = 2.5 m/s2
30 NM
a = 6 m/s2
15
15 NM
30 NM
200 NM
17
m/s2
a = 6 m/s2
14
150 m/s
15 NM
230 NM
16
a = 6 m/s2
13
a = 6 m/s2
a = 1.2
300 m/s
300 m/s
30 NM
50 NM
50 NM
a = 2.5 m/s2
18
19
a = 6 m/s2
20
a = 2.5 m/s2
300 m/s
300 m/s
230 NM
200 NM
21
22
a = 1.2 m/s2
CV1longitudinal
CV2longitudinal
PVheading
CV1heading
CV1heading
s(PV41 )
..
.
s(CV141 )
..
.
S(CV241 )
..
.
s(PV4 j )
..
.
s(CV14 j )
..
.
s(CV24 j )
..
.
1
..
.
s(PV11 )
..
.
s(CV111 )
..
.
s(CV211 )
..
.
j
..
.
s(PV1 j )
..
.
s(CV11 j )
..
.
s(CV21 j )
..
.
..
.
..
.
770
800
PV
700
Plots
z[k]
600
Prediction
z1
Update
500
1]
x[k
P[k 1]
Kalman
filter
x[k]
P[k]
400
300
100
0
Convergence value
RMS specification
CV2
200
CV1
0
100
200
300
Time (s)
400
500
600
3.
x[k],
P[k].
The IMM algorithm develops the following four steps to
process the measures received from the available sensors to
estimate the target state: intermode interaction/mixing, prediction, updating, and combination for output.
(i) The tracking cycle for each received plot z[k] starts
with the interaction phase, mixing the state estimators
coming from each of the four models to obtain the new
inputs x o j [k] and Po j [k]. So, the input to each Kalman
filter is not directly the last update but a weighted combination of all modes taking into account the mode
probabilities. This step is oriented to assure that the
most probable mode dominates the rest.
(ii) Then, the prediction and updating phases are performed with the Kalman filter equations according to
the available models for target motion contained in
each mode.
(iii) The estimated probabilities of modes j [k] are updated, based on two types of variables: a priori transition probabilities of Markov chain pi j , and mode likelihoods computed with the residuals between each plot
and mode predictions j [k].
(iv) Finally, mode probabilities are employed as weights to
combine partial tracks for final output. Besides, each
individual output and probability is internally stored
to process plots coming in the future.
3.2.
x1 [k 1]
P1 [k 1]
771
x2 [k 1]
P2 [k 1]
z1
x4 [k 1]
P4 [k 1]
Interaction/combination
x01 [k 1]
P01 [k 1]
Plots
z[k]
x3 [k 1]
P3 [k 1]
x02 [k 1]
P02 [k 1]
Kalman
filter
=2
Kalman
filter
=1
x03 [k 1]
P03 [k 1]
Kalman
filter
=3
1 [k]
2 [k]
3 [k]
4 [k]
x1 [k]
P1 [k]
x2 [k]
P2 [k]
x3 [k]
P3 [k]
x4 [k]
P4 [k]
x04 [k 1]
P04 [k 1]
1 [k 1] 4 [k 1]
z1
Kalman
filter
=4
Mode
probability
computation
1 [k]
Mode
combination
for output
4 [k]
x[k]
P[k]
Description
pUT
pUL
pTU
pLU
at
t 2
l 2
with zero plant variance noise. Modes for tracking transversal maneuvers (turns), = 2, 3, are filters with circular extrapolation dynamics [4, 5], one for each possible direction.
They provide a highly adaptive response to transversal transitions, being one of the parameters to fix, in this filter, the
typical acceleration of target when performing turns. Finally,
mode = 4 is a linear-extrapolation motion model with a
plant noise component projected along longitudinal direction. Since the target deviations along transversal direction
are covered by circular modes, this last model will quickly
detect and adapt to variations in longitudinal velocity during
accelerations and decelerations.
Each mode in the structure has its own parameters to
tune, and must be adjusted in the design process. Besides, the
transition probabilities between all possible pairs of modes,
modelled as a Markov chain, are directly related with the rate
of change from any mode to the rest. They have a very deep
impact in the tracker behaviour during transitions and the
p
11
p
21
T[k] =
p
31
p41
p12
p22
p32
p42
p13
p23
p33
p43
p14
p24
p34
p44
p
1
p
0
0
TU
TU
.
=
pTU
0
1 pTU
0
pLU
0
0
1 pLU
(1)
772
The number of parameters have been simplified by considering only as possible transitions between uniform motion and
the rest of modes. The parameters pUT , pUL are the probabilities of starting transversal and longitudinal maneuvers, given
an aircraft at uniform motion, while the parameters pTU , pLU
are the probabilities of transitions to uniform motion, given
that the aircraft is performing, respectively, transversal and
longitudinal maneuvers.
It is important to notice that all parameters, those in
each particular model plus transition probabilities in Markov
chain, are completely coupled through the IMM algorithm
since partial outputs from each mode are combined and
feedback all modes. So, there is a strongly nonlinear interaction between them, making the adjusting process certainly
dicult. The whole set of parameters in the tracking structure is summarized in Table 2.
4.
The design of the particular IMM tracking structure addressed in this work, stated as adjusting the seven numeric
input parameters to fit filter performance within ARTAS
specifications, can be generally considered as a numerical optimization problem. We are searching for the proper combination of real input parameters that minimizes a real function assessing the quality of solutions as a cost f : V
R7 R. The final design solution
xd V should be a
7
any x V R . The subspace V stands for the region
of feasible solutions, defined as those vectors representing
a valid IMM filter: parameters for probabilities must fall in
the interval [0, 1] and parameters for variances must be positive. These are the only constraints to be accomplished by
solutions during the search. Performance specifications are
not considered as constraints here, but they will be used as
penalty terms in the objective cost function. The cost would
achieve a minimum value of zero only in the ideal case of a
solution accomplishing all specifications, grading the rest of
possible cases with a positive global cost function that will be
detailed later.
4.1. Evolution strategies
In numeric optimization problems, when f is a smooth,
low-dimensional function, there are an available number
of classic optimization methods. The best case is for lowdimensional analytical functions, where solutions can be analytically determined or found with simple sampling methods. If partial derivatives of function with respect to input
parameters are available, gradient-descent methods could be
used to find the directions leading to a minimum. However,
these gradient-descent methods quickly converge and stop at
local minima, so additional steps must be added to find the
global minimum. For instance, with a moderated number of
global minima, we could run several gradient-descent solvers
to find the best solution. The problem is that the number
of similar local minima increases exponentially with dimensionality, making these types of solvers unfeasible. In our particular case, besides a high-dimensional input space causing
multimodal dependence, we do not have an analytical function to optimize. It is the result of a complex and exhaustive evaluation process implying the simulation and performance assessment of tracking structure on the whole set of
22 scenarios defined. The evaluation of a single point in the
input space requires several minutes of CPU time (Pentium
III, 700 MHz). Besides, the evaluation of quality after all simulations is not direct but it should take into account system
performance in all scenarios and magnitudes in comparison
with the whole table of specifications. As we will see later,
multiple specifications (or objectives) will increase the number of solutions with similar performance, increasing therefore the complexity of the search.
For complex domains, evolutionary algorithms have
proven to be robust and ecient stochastic optimization
methods, combining properties of volume and path-oriented
searching techniques. ES [6] are the evolutionary algorithms
specifically conceived for numerical optimization, and have
been successfully applied to engineering optimization problems with real-valued vector representations [7]. They combine a search process which randomly scans the feasible region (exploration) and local optimization along certain paths
(exploitation), achieving very acceptable rates of robustness
and eciency. Each solution to the problem is defined as an
individual in a population, codifying each individual with a
couple of real-valued vectors: the searched parameters and a
standard deviation of each parameter used in the search process. In this specific problem, one individual will represent
the set of dynamic parameters in the IMM structure, as indicated in Table 2, (x1 , . . . , x7 ), and their corresponding standard deviations (1 , . . . , 7 ).
The optimization search basically consists in evolving a
population of individuals in order to find better solutions.
The computational procedure of ES can be summarized in
the following steps, according to the named + strategy
defined by Back and Schwefel [8], and particularized for our
problem:
(1) generate an initial population with individuals uniformly distributed on the search space V ;
(2) evaluate the objective value for each individual in pop
xi ), i = 1, . . . , ;
ulation f (
(3) Select the best parents in population to generate a set
of new individuals, by means of genetic operators
of recombination and mutation. In this case, recombination follows a canonical discrete recombination [6],
and mutation is carried out as follows:
i = i exp N(0, ) ,
xi = xi + N 0, i ,
(2)
773
in each specific figure of merit. When a problem involves simultaneous optimization of multiple, usually conflicting objectives (or criteria), the goal is not so clear as in the case of
single-objective optimization. The presence of dierent objectives generates a set of alternative solutions, defined as
Pareto-optimal solutions [11]. The presence of conflicting
multiple objectives leads to the fact that dierent solutions
cannot be directly compared and ranked to determine the
best one, but the concept of domination appears for comx1 is dominated by a second one
x2 if
parisons. A solution
wi f i
x .
(3)
i=1
774
Pareto-optimal front
Minimum of
w1 f1 + w2 f2
f1
Minimum of
w1 f1 + w2 f2
f2
50
100
150
200
250
x,
x > 0,
0,
x 0.
R(x) =
(4)
40
60
80
20
40
60
80
Generations
100
120
140
100
120
140
20
15
Fitness
20
10
5
0
ci = R
&
pi s(pi )
.
pi
(5)
(iii) In order to add some flexibility in the trade-o between maneuver and uniform motion performances,
weighting factors t are included. They allow us to vary
the priority of these performance figures, in the case
where all of them cannot be attained at the same time,
defining therefore a cost per jth scenario,
c sj =
4
i=1
'
PV R
PVi j s PVi j
PVi j
(
)
CV1i j s CV1i j
+ CV1 R
s CV1i j
(
CV2i j s CV2i j
+ CV2 R
s CV2i j
)
(6)
)*
where the subindex i represents each interest magnitude (longitudinal, transversal, groundspeed, and
heading) and j the scenario index.
(iv) Finally, considering the set E of all the scenarios where
the performance figures are evaluated (in our example,
the 22 scenarios indicated in Figure 1), the worst case
scenario is j, for each figure of merit and selected time
775
700
1100
1000
600
900
500
800
700
400
600
300
500
400
200
300
100
0
200
0
50
100
150
200 250
Time
300
350
400
450
100
50
100
150
200 250
Time
300
350
400
450
16
18
14
16
12
14
10
12
10
6
4
2
0
50
100
150
200 250
Time
300
350
400
50
100
150
200 250
Time
300
350
400
450
'
(
PV max R
j E
i=1
(
+ CV1 max R
j E
(
+ CV2 max R
j E
PVi j s PVi j
PVi j
)
CV1i j s CV1i j
s CV1i j
CV2i j s CV2i j
s CV2i j
)
(7)
)*
RESULTS
776
350
500
450
300
400
250
350
200
300
250
150
200
100
150
50
0
100
0
50
100
150
200
Time
250
300
350
50
50
100
200
Time
250
300
350
30
150
25
6
20
5
15
4
3
10
2
5
1
0
0
50
100
150
200
Time
250
300
350
50
100
150
200
Time
250
300
350
777
3.5
3
2.5
2
1.5
50
100
sol 2
1.5
1
150
sol 1
sol 5
0.5
0
200
0.5
1.5
250
Fitness
1
2.1
2.05
2
1.95
1.9
1.85
1.8
1.75
1.7
1.65
6
Runs
0.5
0.5
10
1.5
sol 2
0.5
10
Runs
sol 5
sol 1
0.5
x = x
1 + x2 x1 + x5 x1 .
1.5
0.5
0.5
=0
1.82
1.8
1.78
1.76
Fitness
presented scenarios with worst cases but for all design scenarios as well.
Dierent runs of the global optimization process (using
dierent random seeds to generate individuals in the initial
population) were carried out to analyze the consistency of
the solutions obtained. The results of ten independent runs
are indicated in Figure 9, presenting only the best individual
in population after optimization (instead of the whole evolution process) and the final values of fitness achieved.
As it can be seen, dierent runs led to solutions quite consistent in terms of overall fitness and whose specifications are
presenting problems to the filter (always those in scenarios
12 and 13). However, the specific vector solutions found after optimization in each run had significant dierences, indicating that fitness function probably has a multimodal landscape, even after having selected a particular set of weighting
factors among specifications, = 1.
Since it is not possible to represent fitness landscape with
seven dimensions, the following analysis was carried out. The
three solutions with closest fitness values, resulting from runs
1, 2, and 5, were selected to be combined and to generate a
grid of linear combinations (convex hull) as follows:
1.74
1.72
sol 1
sol 2
1.7
1.68
1.66
1.64
0.4 0.2
0.2
0.4
0.6
0.8
1.2
1.4
(8)
The fitness landscape for a grid with , varying in the interval [1.5, 0.5], in steps 0.1 units, is indicated in Figure 10. It
can be seen that the fitness is practically flat over the particular region of search space represented by linear combinations
778
=1
2.5
2.4
2.8
2.3
2.2
2.4
Fitness
Fitness
2.6
2.2
2.1
2
1.9
1.8
1.8
1.6
0.4 0.2
1.7
sol 1
sol 5
sol 2
sol 5
0
0.2
0.4
0.6
0.8
1.2
1.4
1.6
0.5
0.5
1.5
CONCLUSION
Autonoma,
Madrid, Spain in 1995, and his
Ph.D. degree in computer engineering from
Universidad Carlos III de Madrid in 2000.
Since 2002, he has been there as an Assistant Professor of automata theory and programming language translation. His main
research topics are evolutionary computation applications and network optimization
using soft computing.
779
Rita Cucchiara
Dipartimento di Ingegneria dellInformazione, Universit`a di Modena e Reggio Emilia, Via Vignolese, 905 41100 Modena, Italy
Email: rita.cucchiara@unimo.it
Luigi Cinque
Dipartimento di Informatica, Universit`a di Roma La Sapienza, Via Salaria, 113 00198 Roma, Italy
Email: cinque@dsi.uniroma1.it
Stefano Levialdi
Dipartimento di Informatica, Universit`a di Roma La Sapienza, Via Salaria, 113 00198 Roma, Italy
Email: levialdi@dsi.uniroma1.it
Received 1 July 2002 and in revised form 19 November 2002
Several range image segmentation algorithms have been proposed, each one to be tuned by a number of parameters in order
to provide accurate results on a given class of images. Segmentation parameters are generally aected by the type of surfaces
(e.g., planar versus curved) and the nature of the acquisition system (e.g., laser range finders or structured light scanners). It is
impossible to answer the question, which is the best set of parameters given a range image within a class and a range segmentation
algorithm? Systems proposing such a parameter optimization are often based either on careful selection or on solution spacepartitioning methods. Their main drawback is that they have to limit their search to a subset of the solution space to provide an
answer in acceptable time. In order to provide a dierent automated method to search a larger solution space, and possibly to
answer more eectively the above question, we propose a tuning system based on genetic algorithms. A complete set of tests was
performed over a range of dierent images and with dierent segmentation algorithms. Our system provided a particularly high
degree of eectiveness in terms of segmentation quality and search time.
Keywords and phrases: range images, segmentation, genetic algorithms.
1. INTRODUCTION
Image segmentation problems can be approached with several solution methods. The range image segmentation subfield has been addressed in dierent ways. But, since an algorithm should work correctly for a large number of images in
a class, such a program is normally characterized by a high
number of tuning parameters in order to obtain a correct, or
at least satisfactory, segmentation.
Usually the correct set of parameters is given by the developers of the segmentation algorithm, and it is expected
to give satisfactory segmentations for the images in the class
used to tune the parameters. But it is possible that, given
changing input image class, the results are not satisfactory.
To avoid exhaustive test tuning, an expert system to tune parameters should be proposed. In this way, it should be pos-
RELATED WORKS
781
to extract, via classification, edges from noisy range images. Several algorithms (particularly color segmentation algorithms) are described or summarized in [12].
Parameters tuning is still a main task, and a possible solution is proposed. A dierent method to tune set parameters
is given by Min et al. in [2, 3, 4]. The main drawback seems
to be that a limited subset of the complete solution space is
allowed to be explored, but exposes the method to the possibility of missing the global optimum or a good enough local
optimum. But such a method is fast and ecient enough to
represent a fine-tuning step: given a set of rough local suboptima, the algorithm proposed in [2] could quickly explore a
limited space around these suboptima to reach, if they exist,
local optima.
In [6], for the first time, an objective performance comparison of range segmentation algorithms has been proposed. Further results on such comparison have been proposed in [3, 4, 13, 14]. Another comparison has been presented in [15], where another range segmentation algorithm
is proposed. This is based on a robust clustering method
(used also for other tasks). But the need for tuning algorithm
parameters is still present.
2.2.
GA is a well-known spread technique for exploring in parallel a solution space by encoding the concept of evolution in
the algorithmic search: from a population of individuals representing possible problem solutions, evolution is carried out
by means of selection and reproduction of new solutions. Basic principles of GAs are now well known. Quoted references
are the books of Goldberg [16] and Michalewicz [17]; a survey is presented in [18], while a detailed explanation of a basic GA for solving NP-hard optimization problem, presented
by Bhanu et al., can be found in [1].
Many GA-driven segmentation algorithms have been
proposed in the literature; in particular, an interesting solution was presented by Yu et al. [19], an algorithm that can
segment and reconstruct range images via a method called
RESC (RESidual Consensus). Chun and Yang [20] presented
an intensity image segmentation by a GA split-and-merge exploiting strategies; and Andrey and Tarroux [21] proposed
an algorithm which can segment intensity images by including production rules in the chromosome, that is, a data string
representing all the possible features present in a population
member. Methods for segmenting textured images are described by Yoshimura and Oe [22] and Tseng and Lai [23].
The first one adopts a small region-representing chromosome, while the second one uses GAs to improve the iterated conditional modes (ICM) algorithm [24]. Cagnoni et al.
[25] presented a GA based on a small set of manually traced
contours of the structure of interest (anatomical structures
in three-dimensional medical images). The method combines the good trade-o between simplicity and versatility
oered by polynomial filters with the regularization properties that characterize elastic-contour models. Andrey [26]
proposed another interesting work, in which the image to be
782
Using the same rationale as in [1], we adopted a GA for tuning the set of parameters of a range segmentation algorithm.
Dierent approaches to the tuning of parameters could
be represented by evolutionary programming (EP) and evolution strategy (ES).
The first one places emphasis on the behavioral linkage
between parents and their osprings (the solutions). Each
solution is replicated into a new population and is mutated
according to a distribution of mutation types. Each ospring
solution is assessed by computing its fitness. Similarly, the
second one tries random changes in the parameters defining
the solution, following the example of natural mutations.
Like both ES and EP, GA is a useful method of optimization when other techniques, such as gradient descent or direct analytical discovery, are not possible. Combinatoric and
real-valued function optimization in which the optimization
surface or fitness landscape is rugged, possessing many locally
optimal solutions, are well suited for GA.
We chose GA because it is a well-tested method in image
segmentation and a good starting point to explore the evolutionary framework.
Because of the universal model, we have the possibility of changing the segmentation algorithm with few consequent changes in the GA code. These changes mainly involve
the chromosome composition and the generation definition.
The fitness evaluation has been modeled for the problem of
range segmentation and can be kept constant as the reproduction model. This is one of the features of our proposal
that we called GASE or genetic algorithm segmentation environment (introduced as GASP in [30]).
The main goal of GASE is to suggest a signature for a class
of images, that is, the best fitted set of parameters performing
the optimal segmentation. In this way, when our system finds
a good segmentation for an image or for a particular surface,
we can say that the same parameters will work correctly for
the same class of images or for the same class of surfaces (i.e.,
all the surfaces presenting a big curvature radius).
3.1.
In Figure 1, we show the architecture of our system. Following the block diagram, we see that an input image Ii is
first segmented by a program s (range segmentation algorithm) with a parameter set sj , producing a new image having labeled surface patches Misj . All such segmented images
are stored in a database that we call phenotype repository.
Briefly, we may write
Misj = segmentation s, sj , Ii .
(1)
Fisj 0.
(2)
This process is fulfilled for all available images with different parameter sets. The sets that produce the best results
(called w ) are stored in the so-called final genotype repository (if fitness function is under a given threshold). Once
the score is assigned, a tuple Pi j containing the genotype,
the score value, the phenotype identifier, and the generation (sj , Fisj , i j, k) is written in a database called evaluation
repository. The genetic computation selects two individuals to be coupled among the living ones (mating individuals
selection); these genotypes are processed by the crossover
block that outputs one or more osprings that could be mutated. The generated individuals will be the new genotypes
sj in the next generation step.
At the end of a generation, a to-be-deleted individuals
selection is performed. The decision on which individuals
are to be erased from the evaluation repository is made by
fixing a killing probability pk depending on the fitness and
the age of the individuals (their k value). If an individual has
a score greater than pk , the solution it represents will be no
longer considered. In this way, we have a limited number of
evaluated points in the solution space.
3.2.
GASE features
783
sj
Age counter k
GASE
Segmentation
algorithm
s
Crossover
sj
Misj
Ii
Mutation
To-be-deleted
individuals
selection
Reproduction
Phenotype
repository
i , i
Pin
Mating
individuals
selection
Pi , Pin
Gi
Fitness
evaluation
w
Training set
& prototype
repository
Fisj
Genetic evolution
Evaluation
repository
Pi j = (2j , Fisj , i j, k)
Genetic-based learning
correct segmentation,
oversegmentation,
undersegmentation,
miss-segmentation,
noise segmentation.
vectors is not straightforward without using particular techniques; one of them could be to adopt a weighted sum of the
components.
We define the fitness function as a weighted sum of a
number of components:
F = w1 C + w2 Hu + w3 Ho + w4 U :
wi = 1,
(3)
i=1
(4)
784
PGx j Ox j j
j =1
C=
NM
(5)
NM
i=1
Pi
NG
Oi j .
(6)
j =1
mi j =
4.
if i = argmaxNj =M1 Oi j ,
(7)
0 otherwise.
The handicap Hu is accounting for the number of undersegmented regions (those which appear in the resulting image as a whole whilst separated in the ground truth image):
(
Hu = k # RM j :
NG
mi j > 1, j = 1, . . . , NM .
(8)
i=1
Ho = k # RM j :
NG
mi j = 0, j = 1, . . . , NM .
(9)
i=1
The handicaps Ho and Hu are both multiplied by a constant k just to enlarge the variability range.
Some results about the eectiveness of the adopted fitness
function have been presented in [35].
3.4. Coding the chromosomes
One of the main tasks in GASE was to code the chromosome,
that is, to code the parameter set for a given segmentation
algorithm.
EXPERIMENTAL RESULTS
Experiments carried out on GASE are used as a benchmark of the Michigan State University/Washington State
University synthetic image database (that we will refer
to as MSU/WSU database, http://sampl.eng.ohio-state.edu/
sampl/data/3DDB/RID/index.htm) and as a subset of the
University of Bern real database (referred to as ABW). The
tests performed are very time consuming since each segmentation process is iterated for a single experiment many times
(i.e., for each individual of the solution population and for
each generation).
Since we tested our GA with both a fixed and random
number of children crossover, according to [30], we have to
use an alternative definition of generation. The term generation in GAs is often used as a synonym of the iteration
step and is related to the process of creating a new solution. In our case, a generation step is given by the results
obtained in a fixed time slice. In this manner, we can establish a time slice in function of the reference workstation; for
instance, with a standard PC (AMD Duron 700 MHz) running Linux OS, we could define the time slice as one minute
of computation. In order to compare the ecacy and eciency of results, we will define a convergence trend maximum time to get the optimal solution in a given Max G
generations.
4.1.
The first experiment was the tuning of the UB segmentation algorithm [7]. This algorithm initially tries to detect
the edges (jump and crease [37]) of the segmenting image
by computing the scan lines. After finding the candidates
for area borders, it accomplishes an edge-filling process. This
785
Range
Meaning
N
Tpoint
Tperp
Tangle
Tarea
WINSIZE
MAXPTDIST
MAXPERPDIST
MAXANGLE
MINREGPIX
212
0
0
0.0180.0
0
Meaning
Variable type
Range
Th toleran
float
0.515.0
Th length
Th jump
int
float
3
1.020.0
Th crease
Th area
Th morph
float
int
float
0.0180.0
0
1.03.0
Th PRMSE
Th Pavgerr
float
float
0.110.0
0.0510.0
Th CRMSE
Th Cavgerr
float
float
0.110.0
0.0510.0
The range is limited according to the observed lack of meaning of greater values when segmenting MSU/WSU images, so the shown limits are less than
possible.
segmentation algorithm is capable of segmenting curved surfaces and the available version [38] can segment images of the
GRF2-K2T database (named after the brand and model of
the structured light scanner used). We used a version, slightly
modified at the University of Modena, which is able to segment also synthetic images of the MSU/WSU database. A set
of 35 images was chosen and a tuning task as in [6] was executed.
While the tuning done should provide very good results,
it is our opinion that a training set should not be too large.
We then chose a subset of 6 images as our training set. This
set was input to GASE, and the resulting parameters set were
used to segment the test set (formed by the remaining 29 images) and to find the most suitable set.
We fixed our generation in 1 minute and the maximum
number of generations in 30, that is to say, about 30 minutes
of computation for every image of the training set. It took a
total of about 3 hours to obtain 6 possible solutions and to
select the most suitable for the test set. During this time our
algorithm performed about 10000 segmentations on the images. An exhaustive search should explore all the enormous
space of solution (the space has 10 dimensions, and one parameter potentially ranges from 0 to ) and all the instances
of the test set. In our case, the exhaustive search was substi-
786
7.5
3.61
10.0
30.0
4.55
36.78
Th SegmToler
Th Jump
Th Crease
Th PRMSE
Th PAvErr
Th CRMSE
1.11
1.07
1.11
0.51
0.21
0.57
Th CAvErr
Th PostprFact
Th SegmLen
1.09
2.0
3
0.45
1.79
2
Th RegArea
100
Average fitness
15.96
15.04
WINSIZE
MAXPTDIST
10
12.0
9
13.2
MAXPERPDIST
MAXANGLE
MINREGPIX
4.0
25.0
500
5.3
11.45
482
4.2.
The second experiment was performed on the USF segmentation algorithm [6]. Based on a region growing strategy, it computes the normal vector for each pixel within a
parametric-sized window. After that first computation, it selects seed points on the basis of a reliability measure. From
these seed points, it accomplishes the region growing, aggregating surfaces until at least one of four parametric criteria
is met. This segmentation algorithm has been tuned using a
set of parameters proposed by its authors. As we can see in
[6], the given results are very impressive, so we knew how
dicult it will be to improve them. Nevertheless, we performed the following experiment: given the original training
787
Table 6: Average results of USF segmentation algorithm with original opt. val. and GASE opt. val. on 10 ABW images at 80% of compare
tolerance (we recall that tool measures segmentation algorithm performances with respect to a certain precision tolerance, ranging from 51
to 95%).
Parameters set GT regions
Original
GASE
20.1
20.1
Correct detection
Oversegmentation
Undersegmentation
Missed
Noise
13.1
12.9
1.24 (0.96)
1.27 (0.99)
0.1
0.1
0.0
0.0
6.9
7.1
2.8
3.7
inal opt. val. have an opposite behavior since GASE produces less undersegmentation errors but higher oversegmentation. Finally, the last two plots show that there is
no noticeable dierence in noise segmentation and misssegmentation.
5.
788
14
12
10
8
6
4
2
50
55
60
65 70
75
80
85
Compare tool tolerance (%)
0.45
Average number of oversegmentation
instances
16
90
95
100
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
50
55
HE
GA
90
95
100
95
100
0.3
16
14
0.25
Average number of noise
instances
65 70 75 80 85
Compare tool tolerance (%)
HE
GA
0.2
0.15
0.1
0.05
0
50
60
12
10
8
6
4
2
55
60
65 70 75 80
85
Compare tool tolerance (%)
90
95
100
0
50
55
HE
GA
60
65
70
75 80 85
Compare tool tolerance (%)
90
HE
GA
18
Average number of missed instances
16
14
12
10
8
6
4
50
55
60
65
70
75
80
85
Compare tool tolerance (%)
90
95
100
HE
GA
(e) Average missed regions on 10 ABW images.
Figure 4: Results, as measured by the comparison tool, obtained by the original opt. val. (labeled HE) and GASE opt. val. (labeled GA)
on 10 images of the ABW database.
789
[18] M. Srinivas and L. M. Patnaik, Genetic algorithms: a survey,
IEEE Computer, vol. 27, no. 6, pp. 1726, 1994.
[19] X. Yu, T. D. Bui, and A. Krzyzak, Robust estimation for range
image segmentation and reconstruction, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 530
538, 1994.
[20] D. N. Chun and H. S. Yang, Robust image segmentation using genetic algorithm with a fuzzy measure, Pattern Recognition, vol. 29, no. 7, pp. 11951211, 1996.
[21] P. Andrey and P. Tarroux, Unsupervised image segmentation
using a distributed genetic algorithm, Pattern Recognition,
vol. 27, no. 5, pp. 659673, 1994.
[22] M. Yoshimura and S. Oe, Evolutionary segmentation of texture image using genetic algorithms towards automatic decision of optimum number of segmentation areas, Pattern
Recognition, vol. 32, no. 12, pp. 20412054, 1999.
[23] D.-C. Tseng and C.-C. Lai, A genetic algorithm for MRFbased segmentation of multi-spectral textured images, Pattern Recognition Letters, vol. 20, no. 14, pp. 14991510, 1999.
[24] J. Besag, On the statistical analysis of dirty pictures, Journal
of the Royal Statistical Society, vol. 48, no. 3, pp. 259302, 1986.
[25] S. Cagnoni, A. B. Dobrzeniecki, R. Poli, and J. C. Yanch, Genetic algorithm-based interactive segmentation of 3D medical images, Image and Vision Computing, vol. 17, no. 12, pp.
881895, 1999.
[26] P. Andrey, Selectionist relaxation: genetic algorithms applied
to image segmentation, Image and Vision Computing, vol. 17,
no. 3-4, pp. 175187, 1999.
[27] L. S. Davis and A. Rosenfeld, Cooperating processes for lowlevel vision: a survey, Artificial Intelligence, vol. 17, no. 1-3,
pp. 245263, 1981.
[28] K. I. Laws, The Phoenix image segmentation system: description and evaluation, Tech. Rep. 289, SRI International,
Menlo Park, Calif, USA, December 1982.
[29] J. T. Alander, Indexed bibliography of genetic algorithms
in optics and image processing, Tech. Rep. 94-1-OPTICS,
Department of Information Technology and Production Economics, University of Vaasa, Vaasa, Finland, 2000.
[30] L. Cinque, R. Cucchiara, S. Levialdi, S. Martinz, and G. Pignalberi, Optimal range segmentation parameters through
genetic algorithms, in Proc. IEEE International Conference
on Pattern Recognition (ICPR 00), pp. 14741477, Barcelona,
Spain, September 2000.
[31] L. J. Eshelman, R. A. Caruana, and J. D. Schaer, Biases in
the crossover landscape, in Proc. 3rd International Conference
on Genetic Algorithms, J. D. Schaer, Ed., pp. 1019, Fairfax,
Va, USA, June 1989.
[32] G. Syswerda, Uniform crossover in genetic algorithms, in
Proc. 3rd International Conference on Genetic Algorithms, J. D.
Schaer, Ed., pp. 29, Fairfax, Va, USA, June 1989.
[33] M. D. Levine and A. M. Nazif, An experimental rule-based
system for testing low level segmentation strategies, in Multicomputers and Image Processing Algorithms and Programs,
K. Preston and L. Uhr, Eds., pp. 149160, Academic Press,
New York, NY, USA, 1982.
[34] Y. W. Lim and S. U. Lee, On the color image segmentation
algorithm based on the thresholding and the fuzzy C-means
techniques, Pattern Recognition, vol. 23, no. 9, pp. 935952,
1990.
[35] L. Cinque, R. Cucchiara, S. Levialdi, and G. Pignalberi, A
methodology to award a score to range image segmentation,
in Proc. 6th International Conference on Pattern Recognition
and Information Processing, pp. 171175, Minsk, Belarus, May
2001.
[36] F. Herrera, M. Lozano, and J. L. Verdegay, Tackling realcoded genetic algorithms: operators and tools for behavioural
790
analysis, Artificial Intelligence Review, vol. 12, no. 4, pp. 265
319, 1998.
[37] R. Homan and A. K. Jain, Segmentation and classification
of range images, IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 9, no. 5, pp. 608620, 1987.
[38] Range image segmentation comparison project, 2002,
http://marathon.csee.usf.edu/range/seg-comp/results.html.
[39] L. Cinque, R. Cucchiara, S. Levialdi, and G. Pignalberi, A decision support system for range image segmentation, in Proc.
3rd International Conference on Digital Information Processing
and Control in Extreme Situations, pp. 4550, Minsk, Belarus,
May 2002.
Gianluca Pignalberi received in 2000 his
degree in computer science, focusing especially on image processing and artificial
intelligence methods, from the University
of Rome La Sapienza. He is a Consultant, and his current interests include language recognition and data compression
techniques, combined with artificial intelligence methods.
Rita Cucchiara graduated magna cum
laude in 1989, with the Laurea in electronic
engineering from University of Bologna and
received the Ph.D. in computer engineering
from University of Bologna in 1993. She was
an Assistant Professor at the University of
Ferrara and is currently an Associate Professor in computer engineering at the Faculty of Engineering of Modena, University
of Modena and Reggio Emilia, Italy, since
1998. Her research activity includes computer vision and pattern
recognition, and in particular image segmentation, genetic algorithms for optimization, motion analysis, and color analysis. She
is currently involved in research projects of video surveillance, domotics, video transcoding for high performance video servers, and
support to medical diagnosis with image analysis. Rita Cucchiara is
a member of the IEEE, ACM, GIRPR (Italian IAPR), and AIxIA.
Luigi Cinque received his Ph.D. degree in
physics from the University of Napoli in
1983. From 1984 to 1990, he was with the
Laboratory of Artificial Intelligence (Alenia
SpA). Presently, he is a Professor at the Department of Computer Science of the University of Rome La Sapienza. His scientific interests cover image sequences analysis, shape and object recognition, image
database, and advanced man-machine interaction. Professor Cinque is presently an Associate Editor of Pattern Recognition Journal and Pattern Recognition Letters. He is a
senior member of IEEE, ACM, and IAPR. He has been in the program committee of many international conferences in the field of
imaging technology, and he is the author of over 100 scientific publications in international journals and conference proceedings.
Vesa Valim
aki
Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,
FIN-02015 HUT, Espoo, Finland
Pori School of Technology and Economics, Tampere University of Technology, P.O. Box 300,
FIN-28101, Pori, Finland
Email: vesa.valimaki@hut.fi
Received 30 June 2002 and in revised form 2 December 2002
We describe a technique for estimating control parameters for a plucked string synthesis model using a genetic algorithm. The
model has been intensively used for sound synthesis of various string instruments but the fine tuning of the parameters has been
carried out with a semiautomatic method that requires some hand adjustment with human listening. An automated method for
extracting the parameters from recorded tones is described in this paper. The calculation of the fitness function utilizes knowledge
of the properties of human hearing.
Keywords and phrases: sound synthesis, physical modeling synthesis, plucked string synthesis, parameter estimation, genetic
algorithm.
1. INTRODUCTION
Model-based sound synthesis is a powerful tool for creating
natural sounding tones by simulating the sound production
mechanisms and physical behavior of real musical instruments. These mechanisms are often too complex to simulate
in every detail, so simplified models are used for synthesis.
The aim is to generate a perceptually indistinguishable model
for real instruments.
One workable method for physical modelling synthesis is
based on digital waveguide theory proposed by Smith [1]. In
the case of the plucked string instruments, the method can
be extended to model also the plucking style and instrument
body [2, 3]. A synthesis model of this kind can be applied to
synthesize various plucked string instruments by changing
the control parameters and using dierent body and plucking models [4, 5]. A characteristic feature in string instrument tones is the double decay and beating eect [6], which
can be implemented by using two slightly mistuned string
models in parallel to simulate the two polarizations of the
transversal vibratory motion of a real string [7].
Parameter estimation is an important and dicult challenge in sound synthesis. Usually, the natural parameter settings are in great demand at the initial state of the synthesis.
When using these parameters with a model, we are able to
produce real-sounding instrument tones. Various methods
for adjusting the parameters to produce the desired sounds
have been proposed in the literature [4, 8, 9, 10, 11, 12].
An automated parameter calibration method for a plucked
string synthesis model has been proposed in [4, 8], and then
improved in [9]. It gives the estimates for the fundamental
frequency, the decay parameters, and the excitation signal
which is used in commuted synthesis.
Our interest in this paper is the parameter estimation of
the model proposed by Karjalainen et al. [7]. The parameters
of the model have earlier been calibrated automatically, but
the fine-tuning has required some hand adjustment. In this
work, we use recorded tones as a target sound with which the
synthesized tones are compared. All synthesized sounds are
then ranked according to their similarity with the recorded
tone. An accurate way to measure sound quality from the
792
viewpoint of auditory perception would be to carry out listening tests with trained participants and rank the candidate
solutions according to the data obtained from the tests [13].
This method is extremely time consuming and, therefore, we
are forced to use analytical methods to calculate the quality of
the solutions. Various techniques to simulate human hearing
and calculate perceptual quality exist. Perceptual linear predictive (PLP) technique is widely used with speech signals
[14], and frequency-warped digital signal processing is used
to implement perceptually relevant audio applications [15].
In this work, we use an error function that simulates
the human hearing and calculates the perceptual error between the tones. Frequency masking behavior, frequency dependence, and other limitations of human hearing are taken
into account. From the optimization point of view, the task
is to find the global minimum of the error function. The
variables of the function, that is, the parameters of the synthesis model, span the parameter space where each point
corresponds to a set of parameters and thus to a synthesized sound. When dealing with discrete parameter values,
the number of parameter sets is finite and given by the product of the number of possible values of each parameter. Using nine control parameters with 100 possible values, a total
of 1018 combinations exist in the space and, therefore, an exhaustive search is obviously impossible.
Evolutionary algorithms have shown a good performance
in optimizing problems relating to the parameter estimation
of synthesis models. Vuori and Valimaki [16] tried a simulated evolution algorithm for the flute model, and Horner et
al. [17] proposed an automated system for parameter estimation of FM synthesizer using a genetic algorithm (GA). GAs
have been used for automatically designing sound synthesis
algorithms in [18, 19]. In this study, a GA is used to optimize
the perceptual error function.
This paper is sectioned as follows. The plucked string
synthesis model and the control parameters to be estimated
are described in Section 2. Parameter estimation problem
and methods for solving it are discussed in Section 3.
Section 4 concentrates on the calculation of the perceptual
error. In Section 5, we discretize the parameter space in a
perceptually reasonable manner. Implementation of the GA
and dierent schemes for selection, mutation, and crossover
used in our work are surveyed in Section 6. Experiments and
results are analyzed in Section 7 and conclusions are finally
drawn in Section 8.
2.
Horizontal polarization
Excitation
database
Sh (z)
mo
mp
out
gc
1 mp
1 mo
Sv (z)
Vertical polarization
x(n)
y(n)
F(z)
zLI
H(z)
models Sh (z) and Sv (z) that simulate the eect of the two
polarizations of the transversal vibratory motion. A single
string model S(z) in Figure 2 consists of a lowpass filter H(z)
that controls the decay rate of the harmonics, a delay line
zLI , and a fractional delay filter F(z). The delay time around
the loop for a given fundamental frequency f0 is
Ld =
fs
,
f0
(1)
1+a
1 + az1
(2)
+ m p 1 mo gc Sh (z)Sv (z),
(3)
793
Control
Fundamental frequency of the horizontal string model
Fundamental frequency of the vertical string model
Loop gain of the horizontal string model
Frequency-dependent gain of the horizontal string model
Loop gain of the vertical string model
Frequency-dependent gain of the vertical string model
Input mixing coecient
Output mixing coecient
Coupling gain of the two polarizations
where the string models Sh (z) and Sv (z) for the two polarizations can be written as an individual string model
S(z) =
1
1
zLI F(z)H(z)
(4)
Determination of the proper parameter values for sound synthesis systems is an important problem and also depends on
the purpose of the synthesis. When the goal is to imitate the
sounds of real instruments, the aim of the estimation is unambiguous: we wish to find a parameter set which gives the
sound output that is suciently similar to the natural one in
terms of human perception. These parameters are also feasible for virtual instruments at the initial stage after which the
limits of real instruments can be exceeded by adjusting the
parameters in more creative ways.
Parameters of a synthesis model correspond normally
to the physical characteristics of an instrument [7]. The
estimation procedure can then be seen as sound analysis
where the parameters are extracted from the sound or from
the measurements of physical behavior of an instrument
[23]. Usually, the model parameters have to be fine-tuned
by laborious trial and error experiments, in collaboration
with accomplished players [23]. Parameters for the synthesis model in Figure 1 have earlier been estimated this way
and recently in a semiautomatic fashion, where some parameter values can be obtained with an estimation algorithm while others must be guessed. Another approach is
to consider the parameter estimation problem as a nonlinear optimization process and take advantage of the general searching methods. All possible parameter sets can then
be ranked according to their similarity with the desired
sound.
3.1.
Calibrator
Optimization
Instead of extracting the parameters from audio measurements, our approach here is to find the parameter set that
produces a tone that is perceptually indistinguishable from
the target one. Each parameter set can be assigned with a
794
quality value which denotes how good is the candidate solution. This performance metric is usually called a fitness
function, or inversely, an error function. A parameter set is
fed into the fitness function which calculates the error between the corresponding synthesized tone and the desired
sound. The smaller the error, the better the parameter set and
the higher the fitness value. These functions give a numerical grade to each solution, by means of which we are able to
classify all possible parameter sets.
4.
FITNESS CALCULATION
N
1
m = 0, 1, 2, . . . ,
n=0
(5)
with
wk =
2k
,
N
k = 0, 1, 2, . . . , N 1,
(6)
E=
1
1
O(m, k) T(m, k) 2 ,
=
F
L m=0 k=0
(7)
0.76 f
kHz
&
+ 3.5 arctan
f
7.5 kHz
&2
(8)
where f is the frequency in Hertz and is the mapped frequency in Bark units. The energy in each critical band is calculated by summing the frequency components in the critical
band. The number of critical bands depends on the sampling
rate and is 25 for the sample rate of 44.1 kHz. The discrete
representation of fixed critical bands is a close approximation and, in reality, each band builds up around a narrow
band excitation. A power spectrum P(k) and energy per critical band Z() for a 12 milliseconds excerpt from a guitar
tone are shown in Figure 3a.
The eect of masking of each narrow band excitation
spreads across all critical bands. This is described by a spreading function given in [31]
10 log10 B() = 15.91 + 7.5( + 0.474)
+
17.5 1 + ( + 0.474)2 dB.
(9)
= min
&
V
,1 ,
Vmax
(10)
795
20
Magnitude (dB)
Magnitude (dB)
20
40
60
20
60
80
80
100
40
63
P(k)
250
1k
Frequency (Hz)
4k
100
6
16k
0
Bark
Z()
(a) Power spectrum (solid line) and energy per critical band
(dashed line).
20
20
Magnitude (dB)
Magnitude (dB)
40
60
80
100
20
40
60
80
63
P(k)
250
1k
Frequency (Hz)
4k
16k
S()
100
20
63
250
1k
Frequency (Hz)
P(k)
(c) Power spectrum (solid line) and spread energy per critical
band (dashed line).
4k
16k
W()
Figure 3: Determining the threshold of masking for a 12 milliseconds excerpt from a recorded guitar tone. Fundamental frequency of the
tone is 331 Hz.
where Vmax = 60 dB. That is to say that if the masker signal is entirely tonelike, then = 1, and if the signal is pure
noise, then = 0. The tonality factor is used to geometrically weight the two thresholds mentioned above to form the
masking energy oset U() for a critical band
U() = (14.5 + ) + 5.5(1 ).
(11)
(12)
R()
,
Np
(13)
where N p is the number of points in the particular critical band. The final threshold of masking for a frequency
spectrum W(k) is calculated by comparing the normalized
threshold to the absolute threshold of hearing and mapping from Bark to the frequency scale. The most sensitive
area in human hearing is around 4 kHz. If the normalized
796
Amplitude (dB)
20
40
G(m, k) =
0
otherwise,
(14)
1
Fp
N 1
L1
1
O(m, k) T(m, k) 2 G(m, k)
Ws (k)
L k=0
m=0
O(m, k) T(m, k) 2 H(m, k) ,
(15)
where Ws (k) is an inverted equal loudness curve at sound
pressure level of 60 dB shown in Figure 4 that is used to
weight the error and imitate the frequency-dependent sensitivity of human hearing.
5.
20
63
250
1k
Frequency (Hz)
4k
16k
otherwise,
H(m, k)
=
60
reduced to cover only all the possible musical tones and deviation steps can be kept just below the discrimination threshold.
5.1.
Decay parameters
797
0
0.1
0.2
Value of parameter a
Value of parameter g
0.95
0.9
0.85
0.3
0.4
0.5
0.8
0.6
0.75
0
20
40
60
20
Discrete scale
(a) Discrete values for the parameter g when f0 = 331 and the
variation for the time constant is 10%.
10
20
Amplitude (dB)
Amplitude (dB)
60
(b) Discrete values for the parameter a when the variation is 7%.
30
40
12
50
60
40
Discrete scale
10
15
5000
10000
15000
Frequency (Hz)
Time (s)
(c) Amplitude envelopes of tones with dierent discrete values of g.
20000
fundamental frequencies of two polarizations dier, the frequency estimate settles in the middle of the frequencies, as
shown in Figure 6. Frequency discrimination thresholds as
a function of frequency have been proposed in [33]. Also
the audibility of beating and amplitude modulation has been
studied in [27]. These results do not give us directly the discrimination thresholds for the dierence in the fundamental
frequencies of the two-polarization string model, because the
fluctuation strength in an output sound depends on the fundamental frequencies and the decay parameters g and a.
The sensitivity of parameters can be examined when a
synthesized tone with known parameter values is used as a
target tone with which another synthesized tone is compared.
Varying one parameter after another and freezing the others, we obtain the error as a function of the parameters. In
Figure 7, the target values of f0,v and f0,h are 331 and 330 Hz.
The solid line shows the error when f0,v is linearly swept from
327 to 344 Hz. The global minimum is obviously found when
rp =
which is shown in Figure 8.
10
(16)
798
150
Error
Normalized magnitude
200
0.5
100
0.5
50
0
328
0
0.01
0.02
Time (s)
80 Hz
84 Hz
0.03
0.04
f0,v
( f0,v + f0,h )/2
80 + 84 Hz
Maximum
0.5
f0,h
10
9
0
r p+ r p (Hz)
Normalized magnitude
333
0.5
329
330
331
332
Fundamental frequency f0 (Hz)
0.01
0.011
80 Hz
84 Hz
0.012
Time (s)
0.013
8
7
6
0.014
5
80 + 84 Hz
Maximum
125
250
500
Frequency estimate f0 (Hz)
1k
GENETIC ALGORITHM
799
300
250
Error
200
150
100
50
0
0.2
0.4
0.6
0.8
Gain
mp
mo
gc
Target values
0.8
0.6
(17)
q
,
1 (1 q)S p
(18)
0.4
where
0.2
q =
0
0
10
20
Discrete scale
30
40
characteristics of the individuals from generation to generation. Each individual, called a chromosome, is made up of
an array of genes that contain, in our case, the actual parameters to be estimated.
In the original algorithm design, the chromosomes were
represented with binary numbers [35]. Michalewicz [36]
showed that representing the chromosomes with floatingpoint numbers results in faster, more consistent, higher precision, and more intuitive solution of the algorithm. We
use a GA with the floating-point representation, although
the parameter space is discrete, as discussed in Section 5.
We have also experimented with the binary-number representation, but the execution time of the iteration becomes
slow. Nonuniformly graduated parameter space is transformed into the uniform scales where the GA operates on.
The floating-point numbers are rounded to the nearest dis-
xo = h
x p,2
x p,1 +
x p,2 ,
(19)
800
(6) Mutation: randomly pick a specified number of individuals for mutation. Uniform, nonuniform, multinonuniform, and boundary mutation schemes are
used. Mutation works with a single individual at a
time. Uniform mutation sets a randomly selected parameter (gene) to a uniform random number between
the boundaries. Nonuniform mutation operates uniformly at early stage and more locally as the current
generation approaches the maximum generation. We
have defined the scheme to operate in such a way that
the change is always at least one discrete step. The degree of nonuniformity is controlled with the parameter b. Nonuniformity is important for fine-tuning.
Multi-nonuniform mutation changes all of the parameters in the current individual. Boundary mutation sets a parameter to one of its boundaries and is
useful if the optimal solution is supposed to lie near
the boundaries of the parameter space. The boundary mutation is used in special cases, such as staccato
tones.
(7) Replace the current population with the new one.
(8) Repeat steps 3, 4, 5, 6, and 7 until termination.
Our algorithm is terminated when a specified number of
generations is produced. The number of generations defines
the maximum duration of the algorithm. In our case, the
time spent with the GA operations is negligible compared to
the synthesis and fitness calculation. Synthesis of a tone with
candidate parameter values takes approximately 0.5 second,
while the duration of the error calculation is 1.2 second. This
makes 1.7 second in total for a single parameter set.
7.
parameters gh , gv , and av . The adjacent point in the discrete grid is estimated for the decay parameter ah . As can
be seen in Figure 7, the sensitivity of the mean frequency
is negligible compared to the dierence d f , which might be
the cause of deviations in mean frequency. Dierences in the
mixing parameters mo , m p , and the coupling coecient gc
can be noticed. When running the algorithm multiple times,
no explicit optima for mixing and coupling parameters were
found. However, synthesized tones produced by corresponding parameter values are indistinguishable. That is to say that
the parameters m p , mo , and gc are not orthogonal, which is
clearly a problem with the model and also impairs the eciency of our parameter estimation algorithm.
To overcome the nonorthogonality problem, we have run
the algorithm with constant values of m p = mo = 0.5 in experiment 2. If the target parameters are set according to discrete grid, the exact parameters with zero error are estimated.
The convergence of the parameters and the error of such case
is shown in Figure 11. Apart from the fact that the parameter
values are estimated precisely, the convergence of the algorithm is very fast. Zero error is already found in generation
87.
A similar behavior is noticed in experiment 3 where an
extracted excitation is used for resynthesis. The dierence
and the decay parameters gh and gv are again estimated precisely. Parameters m p , mo , and gc drift as in previous experiment. Interestingly, m p = 1, which means that the straight
path to vertical polarization is totally closed. The model is, in
a manner of speaking, rearranged in such a way that the individual string models are in series as opposed to the original
construction where the polarization are arranged in parallel.
Unlike in experiments 1 and 2, the exact parameter values are not so relevant since dierent excitation signals are
used for the target and estimated tones. Rather than looking into the parameter values, it is better to analyze the tones
produced with the parameters. In Figure 12, the overall temporal envelopes and the envelopes of the first eight partials
for the target and for the estimated tone are presented. As
can be seen, the overall temporal envelopes are almost identical and the partial envelopes match well. Only the beating
amplitude diers slightly but it is inaudible. This indicates
that the parametrization of the model itself is not the best
possible since similar tones can be synthesized with various
parameter sets.
Our estimation method is designed to be used with real
recorded tones. Time and frequency analysis for such case
is shown in Figure 13. As can be seen, the overall temporal envelopes and the partial envelopes for a recorded tone
are very similar to those that are analyzed from a tone that
uses estimated parameter values. Appraisal of the perceptual
quality of synthesized tones is left as a future project, but
our informal listening indicates that the quality is comparable with or better than our previous methods and it does
not require any hand tuning after the estimation procedure.
Sound clips demonstrating these experiments are available at
http://www.acoustics.hut.fi/publications/papers/jasp-ga.
801
3
332
332.5
331.5
331
330.5
330
329.5
2.5
2
1.5
1
0.5
0
329
50
100
150
50
f0
100
150
Generation
Generation
Target value of f0
df
Target value of d f
0.1
0.99
0.98
Value of a
Value of g
0.2
0.97
0.3
0.4
0.5
0.96
0.95
0.6
100
200
300
Generation
Target value of gh
Target value of gv
gh
gv
400
ah
av
0.8
20
0.6
15
Gain
Error
25
0.4
10
0.2
50
100
Generation
gc
150
Target value of ah
Target value of av
100
Generation
50
150
100
200
Generation
300
400
Target values
Figure 11: Convergence of the seven parameters and the error for experiment 2 in Table 2. Mixing coecients are frozen as m p = mo = 0.5 to
overcome the nonorthogonality problem. One hundred and fifty generations are shown and the original excitation is used for the resynthesis.
802
Table 2: Original and estimated parameters when a synthesized tone with known parameter values are used as a target tone. The original
excitation is used for resynthesis in experiments 1 and 2 and the extracted excitation is used for the resynthesis in experiment 3. In experiment
2 the mixing coecients are frozen as m p = mo = 0.5.
Parameter
Target parameter
Experiment 1
Experiment 2
Experiment 3
f0
330.5409
331.000850
330.5409
330.00085
df
0.8987
0.8987
0.8987
0.8987
gh
0.9873
0.9873
0.9873
0.9873
ah
gv
0.2905
0.3108
0.2905
0.2071
0.9907
0.9907
0.9907
0.9907
av
mp
0.1936
0.1936
0.1936
0.1290
0.5
0.2603
(0.5)
1.000
mo
gc
0.5
0.1013
0.6971
0.2628
(0.5)
0.1013
0.8715
0.2450
0.0464
0.4131
Error
0.5
Amplitude (dB)
Normalized amplitude
20
40
60
0.5
2
0
4
1
0.5
1
Time (s)
1.5
Partial
6
8 2
1
Time (s)
0.5
Amplitude (dB)
Normalized amplitude
20
40
60
0.5
2
0
4
1
0.5
1
Time (s)
1.5
Partial
6
8 2
1
Time (s)
Figure 12: Time and frequency analysis for experiment 3 in Table 2. The synthesized target tone is produced with known parameter values
and the synthesized tone uses estimated parameter values. Extracted excitation is used for the resynthesis.
803
0.5
Amplitude (dB)
Normalized amplitude
20
40
60
0.5
2
0
4
1
2
Time (s)
Partial
6
8 2
1
Time (s)
0.5
0
Amplitude (dB)
Normalized amplitude
20
40
60
0.5
2
0
4
1
2
Time (s)
Partial
6
8 2
1
Time (s)
Figure 13: Time and frequency analysis for a recorded tone and for a synthesized tone that uses estimated parameter values. Extracted
excitation is used for the resynthesis. Estimated parameter values are f0 = 331.1044, d f = 1.1558, gh = 0.9762, ah = 0.4991, gv = 0.9925,
av = 0.0751, m p = 0.1865, mo = 0.7397, and gc = 0.1250.
804
expected to hear good results using the GA-based method,
which was also the case.
Appraisal of synthetic tones that use parameter values
from the proposed GA-based method is left as a future
project. Listening tests similar to those used for evaluating
high-quality audio coding algorithms may be useful for this
task.
REFERENCES
[1] J. O. Smith, Physical modeling using digital waveguides,
Computer Music Journal, vol. 16, no. 4, pp. 7491, 1992.
[2] J. O. Smith, Ecient synthesis of stringed musical instruments, in Proc. International Computer Music Conference
(ICMC 93), pp. 6471, Tokyo, Japan, September 1993.
[3] M. Karjalainen, V. Valimaki, and Z. Janosy, Towards highquality sound synthesis of the guitar and string instruments,
in Proc. International Computer Music Conference (ICMC 93),
pp. 5663, Tokyo, Japan, September 1993.
[4] V. Valimaki, J. Huopaniemi, M. Karjalainen, and Z. Janosy,
Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp. 331353, 1996.
[5] M. Laurson, C. Erkut, V. Valimaki, and M. Kuuskankare,
Methods for modeling realistic playing in acoustic guitar
synthesis, Computer Music Journal, vol. 25, no. 3, pp. 3849,
2001.
[6] G. Weinreich, Coupled piano strings, Journal of the Acoustical Society of America, vol. 62, no. 6, pp. 14741484, 1977.
[7] M. Karjalainen, V. Valimaki, and T. Tolonen, Plucked-string
models: from the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol. 22, no. 3, pp.
1732, 1998.
[8] T. Tolonen and V. Valimaki, Automated parameter extraction for plucked string synthesis, in Proc. International Symposium on Musical Acoustics (ISMA 97), pp. 245250, Edinburgh, Scotland, August 1997.
[9] C. Erkut, V. Valimaki, M. Karjalainen, and M. Laurson, Extraction of physical and expressive parameters for modelbased sound synthesis of the classical guitar, in the Audio Engineering Society 108th International Convention, Paris,
France, February 2000, preprint 5114, http://lib.hut.fi/Diss/
2002/isbn9512261901.
[10] A. Nackaerts, B. De Moor, and R. Lauwereins, Parameter
estimation for dual-polarization plucked string models, in
Proc. International Computer Music Conference (ICMC 01),
pp. 203206, Havana, Cuba, September 2001.
[11] S.-F. Liang and A. W. Y. Su, Recurrent neural-network-based
physical model for the chin and other plucked-string instruments, Journal of the Audio Engineering Society, vol. 48, no.
11, pp. 10451059, 2000.
[12] C. Drioli and D. Rocchesso, Learning pseudo-physical models for sound synthesis and transformation, in Proc. IEEE International Conference on Systems, Man, and Cybernetics, pp.
10851090, San Diego, Calif, USA, October 1998.
[13] V.-V. Mattila and N. Zacharov, Generalized listener selection
(GLS) procedure, in the Audio Engineering Society 110th International Convention, Amsterdam, The Netherlands, 2001,
preprint 5405.
[14] H. Hermansky, Perceptual linear predictive (PLP) analysis of
speech, Journal of the Acoustical Society of America, vol. 87,
no. 4, pp. 17381752, 1990.
[15] A. Harma, M. Karjalainen, L. Savioja, V. Valimaki, U. Laine,
and J. Huopaniemi, Frequency-warped signal processing for
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
805
Andreas Uhl
Department of Scientific Computing, University of Salzburg, Jakob Haringer Street 2, A-5020 Salzburg, Austria
Email: uhl@cosy.sbg.ac.at
Received 30 June 2002 and in revised form 27 November 2002
In image compression, the wavelet transformation is a state-of-the-art component. Recently, wavelet packet decomposition has
received quite an interest. A popular approach for wavelet packet decomposition is the near-best-basis algorithm using nonadditive
cost functions. In contrast to additive cost functions, the wavelet packet decomposition of the near-best-basis algorithm is only
suboptimal. We apply methods from the field of evolutionary computation (EC) to test the quality of the near-best-basis results.
We observe a phenomenon: the results of the near-best-basis algorithm are inferior in terms of cost-function optimization but are
superior in terms of rate/distortion performance compared to EC methods.
Keywords and phrases: image compression, wavelet packets, best basis algorithm, genetic algorithms, random search.
1.
INTRODUCTION
COST FUNCTIONS
As a preliminary, we review the definitions of a cost function and the additivity. A cost function is a function C :
RM RN R. If y RM RN is a matrix of wavelet
coecients
and C is a cost function, then C(0) = 0 and
C(y) = i, j C(yi j ). A cost function C is additive if and only
if
C a z 1 z2 = C a z 1 + Ca z 2 ,
(1)
i, j:pi j
=0
pi j ln pi j ,
pi j =
2
yi j
y
2
(2)
(3)
From the definition of the weak l p norm, we deduce that unfavorable slowly decreasing sequences or, in the worst case,
uniform sequences of vectors z cause high numerical values
of the norm, whereas fast decreasing zs result in low ones.
(iii) Shannon entropy. Below, we will consider the matrix y simply as a collection of real-valued coecients xi ,
1 i MN. The matrix y is rearranged such that the first
row is concatenated with the second row at the right side and
then the new row is concatenated with the third row and so
on. With a simple histogram binning method, we will estimate the probability mass function. The sample data interval
is given by a = mini xi and b = maxi xi . Given the number of
bins J, the bin width w is w = (b a)/J. The frequency f j for
j 1
the jth bin is defined by f j = #{xi | xi a + jw} k=1 fk .
The probabilities p j are calculated from the frequencies f j
simply by p j = f j /MN. From the obtained class probabilities, we can calculate the Shannon entropy [14]
Cn2,J (y) =
J
j =1
2.
807
p j log2 p j .
(4)
NBB ALGORITHM
With additive cost functions, a dynamic programming approach, that is, the BB algorithm [13], provides the optimal
WP decomposition with respect to the applied cost function.
Basically, the BB algorithm traverses the quadtree in a depthfirst-search manner and starts at the level right above the
leaves of the decomposition quadtree. The sum of the cost
of the children node is compared to the cost of the parent
node. If the sum is less than the cost of the parent node, the
situation remains unchanged. But, if the cost of the parent
node is less than the cost of the children, then the child nodes
are pruned o the tree. From bottom upwards, the tree is reduced whenever the cost of a certain branch can be reduced.
An illustrating example is presented in [15]. It is an essential
property of the BB algorithm that the decomposition tree is
optimal in terms of the cost criteria, but not in terms of the
obtained r/d performance.
When switching from additive to nonadditive cost functions, the locality of the cost function evaluation is lost. The
BB algorithm can still be applied because the correlation
among the subbands is assumed to be minor but obviously
the result is only suboptimal. Hence, instead of BB, this new
variant is called NBB [14].
808
4.
decompose,
bk =
0 stop.
(5)
If the bit at index k is set (bk = 1), the indices of the resulting
four subbands are derived by
km
= 4 k + m,
1 m 4.
(6)
0,
l=
l :
k = 0 (root),
l1
4r k <
r =0
4r , k > 0.
(7)
r =0
GENETIC ALGORITHM
Genetic algorithms (GAs) are evolution-based search algorithms especially designed for parameter optimization problems with vast search spaces. GAs were first proposed in the
seventies by Holland [17]. Generally, parameter optimization
problems consist of an objective function to evaluate and estimate the quality of an admissible parameter set, that is, a solution of the problem (not necessarily the optimal, just anyone). For the GA, the parameter set needs to be encoded into
a string over a finite alphabet (usually a binary alphabet). The
encoded parameter set is called a genotype. Usually, the objective function is slightly modified to meet the requirements
of the GA and hence will be called fitness function. The fitness function determines the quality (fitness) for each genotype (encoded solution). The combination of a genotype and
the corresponding fitness forms an individual. At the start of
an evolution process, an initial population, which consists of
a fixed number of individuals, is generated randomly. In a
selection process, individuals of high fitness are selected for
recombination. The selection scheme mimics natures principle of the survival of the fittest. During recombination, two
individuals at the time exchange genetic material, that is,
parts of the genotype string, are exchanged at random. After a new intermediate population has been created, a mutation operator is applied. The mutation operator randomly
changes some of the alleles (values at certain positions/loci of
Tree crossover
809
25.4
2
25.3
PSNR
25.2
4
25.1
25
10
12
11
13
14
15
24.9
24.8
24.7
(a) Individual A.
0
10
20
30
40 50 60
Generations
70
80
90
100
1
NBB: Wl
RS
GA: TS (t = 2)
10
11
12
13
14
15
1
1
1
2
0
1
3
1
0
4
0
1
5
0
1
6
1
0
7
0
0
8
0
1
9
0
1
10
0
1
11
0
1
12
1
0
13
1
0
(b) Individual B.
RANDOM SEARCH
4
8
5
9
10
12
11
13
14
15
(a) Individual A .
1
2
4
8
5
9
10
6
11
12
7
13
14
15
(b) Individual B .
810
10
12
11
13
14
15
Figure 5: Barbara.
2
3
5
4
8
10
6
11
12
7
13
14
15
Similar to the GA, we can apply the RS using PSNR instead of cost functions to evaluate WP decompositions. Using a RS as discussed above with a decomposition depth of
at least 4 for the approximation subband, we generate 4000
almost unique samples of WP decompositions and evaluate
the corresponding PSNR. The WP decomposition with the
highest PSNR value is recorded. We have repeated the single
RS runs at least 90 times. The best three results in decreasing order and the least result of a single RS run for the image
Barbara are presented as follows: 24.648, 24.6418, 24.6368,
. . . , 24.4094.
If we compare the results of the RS to those obtained by
NBB with cost function weak l1 norm (PSNR 25.47), we realize that the RS is about 1 dB below the NBB algorithm. To
increase the probability of a high quality result of the RS, a
drastic increase of the sample size is required, which again
would result in a tremendous increase of the RS runtime.
7.
811
25.4
25
25.2
24
23
25
21
PSNR
PSNR
22
20
19
24.6
18
17
16
15
24.8
24.4
3
5
6
7
8
9
Coifman-Wickerhauser entropy
10
11
Random WPs
24.2
3.48 3.485 3.49 3.495 3.5 3.505 3.51 3.515 3.52 3.525 3.53
Coifman-Wickerhauser entropy
NBB
RS
GA: TS (t = 2)
812
25.6
25.4
PSNR
25.2
9.
25
24.8
24.6
24.4
24.2
400000
450000
NBB
RS
GA: TS (t = 2)
500000
550000
Weak l norm
600000
650000
25.2
25.1
25
24.9
PSNR
24.8
24.7
24.6
24.5
24.4
24.3
24.2
0.0071 0.0072 0.0073 0.0074 0.0075 0.0076 0.0077 0.0078
Shannon entropy
NBB
RS
GA: TS (t = 2)
SUMMARY
REFERENCES
for this phenomenon. As a result, the correlation of the costfunction value and the PSNR, as indicated in all three scatter plots, is imperfect. (In the case of perfect correlation, we
would observe a line starting in the right and descending to
the left.)
The NBB algorithm generates WP decompositions according to split and combine decisions based on costfunction evaluations. In contrast, RS and GA generate a complete WP decomposition and the cost-function value is computed afterwards. The overall cost-function values of NBB,
[5] R. Oktem,
L. Oktem,
and K. Egiazarian, Wavelet based image
compression by adaptive scanning of transform coecients,
Journal of Electronic Imaging, vol. 2, no. 11, pp. 257261, 2002.
[6] Z. Xiong, K. Ramchandran, and M. T. Orchard, Wavelet
packet image coding using space-frequency quantization,
IEEE Trans. Image Processing, vol. 7, no. 6, pp. 892898, 1998.
[7] A. Said and W. A. Pearlman, A new, fast, and ecient image
codec based on set partitioning in hierarchical trees, IEEE
Trans. Circuits and Systems for Video Technology, vol. 6, no. 3,
pp. 243250, 1996.
[8] K. Ramchandran and M. Vetterli, Best wavelet packet bases
in a rate-distortion sense, IEEE Trans. Image Processing, vol.
2, no. 2, pp. 160175, 1993.
[9] N. M. Rajpoot, R. G. Wilson, F. G. Meyer, and R. R. Coifman, A new basis selection paradigm for wavelet packet image coding, in Proc. International Conference on Image Processing (ICIP 01), pp. 816819, Thessaloniki, Greece, October
2001.
[10] T. Hopper, Compression of gray-scale fingerprint images,
in Wavelet Applications, H. H. Szu, Ed., vol. 2242 of SPIE Proceedings, pp. 180187, Orlando, Fla, USA, 1994.
[11] T. Schell and A. Uhl, Customized evolutionary optimization
of subband structures for wavelet packet image compression,
in Advances in Fuzzy Systems and Evolutionary Computation,
N. Mastorakis, Ed., pp. 293298, World Scientific Engineering
Society, Puerto de la Cruz, Spain, February 2001.
[12] T. Schell and A. Uhl, New models for generating optimal
wavelet-packet-tree-structures, in Proc. 3rd IEEE Benelux
Signal Processing Symposium (SPS 02), pp. 225228, IEEE
Benelux Signal Processing Chapter, Leuven, Belgium, March
2002.
[13] R. R. Coifman and M. V. Wickerhauser, Entropy based algorithms for best basis selection, IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 713718, 1992.
[14] C. Taswell, Satisficing search algorithms for selecting nearbest bases in adaptive tree-structured wavelet transforms,
IEEE Transactions on Signal Processing, vol. 44, no. 10, pp.
24232438, 1996.
[15] M. V. Wickerhauser, Adapted Wavelet Analysis from Theory to
Software, A. K. Peters, Wellesley, Mass, USA, 1994.
[16] C. Taswell, Near-best basis selection algorithms with nonadditive information cost functions, in Proc. IEEE International Symposium on Time-Frequency and Time-Scale Analysis
(TFTS 94), M. Amin, Ed., pp. 1316, IEEE Press, Philadelphia, Pa, USA, October 1994.
[17] J. H. Holland, Adaptation in Natural and Artificial Systems,
MIT Press, Ann Arbor, Mich, USA, 1975.
[18] T. Schell and S. Wegenkittl, Looking beyond selection probabilities: adaption of the 2 measure for the performance analysis of selection methods in GA, Evolutionary Computation,
vol. 9, no. 2, pp. 243256, 2001.
[19] J. E. Baker, Adaptive selection methods for genetic algorithms, in Proc. 1st International Conference on Genetic Algorithms and Their Applications, J. J. Grefenstette, Ed., pp. 101
111, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, July
1985.
[20] T. Schell, Evolutionary optimization: selection schemes, sampling and applications in image processing and pseudo random number generation, Ph.D. thesis, University of Salzburg,
Salzburg, Austria, 2001.
[21] R. Kutil, A significance map based adaptive wavelet zerotree
codec (SMAWZ), in Media Processors 2002, S. Panchanathan,
V. Bove, and S. I. Sudharsanan, Eds., vol. 4674 of SPIE Proceedings, pp. 6171, San Jose, Calif, USA, January 2002.
813
Douglas OShaughnessy
INRS-Energie-Materiaux-Telecommunications, Universite du Quebec, 800 de la Gaucheti`ere Ouest,
place Bonaventure, Montreal, Canada H5A 1K6
Email: dougo@inrs-telecom.uquebec.ca
Received 14 June 2002 and in revised form 6 December 2002
Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech
recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Lo`eve transform (KLT) in the melfrequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The
enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique,
when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe
interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to 4 dB. We also showed
the eectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
Keywords and phrases: speech recognition, genetic algorithms, Karhunen-Lo`eve transform, hidden Markov models, robustness.
1.
INTRODUCTION
815
(1)
2.2.
(2)
Cn =
The mismatch between the training and the testing environments leads to a worse estimate for the likelihood of o given
and thus degrades CSR performance. Reducing this mismatch should increase the correct recognition rate. The mismatch can be viewed by considering the signal space, the feature space, or the model space. We are concerned with the
feature space, and consider a transformation T that maps
into a transformed feature space. Our approach is to find T
and the phone sequence w that maximize the joint likelihood of o and w given :
[T , w ] = argmax p(o | w, T, )p(w).
(3)
M
k=1
Xk cos
&
n
(k 0.5) ,
M
n = 1, 2, . . . , N,
(4)
In order to reduce the eects of noise on ASR, many methods propose to decompose the vector space of the noisy signal
into a signal-plus-noise subspace and a noise subspace [10].
We remove the noise subspace and estimate the clean signal
from the remaining signal space. Such a decomposition applies the KLT to the noisy zero-mean normalized data.
816
MFC analysis
Clean speech
Enhanced
MFCC
a22
a11
a33
KLT
S2
S1
decomposition
a13
Recognition
S3
a23
HMM
MFC analysis
Noisy speech
correlation matrix R = [C
can be represented as a linear combination of eigenvectors
1 , 2 , . . . , r , which correspond to eigenvalues 1 2
can be calculated using
r 0, respectively. That is, C
the following orthogonal transformation:
=
C
r
k=1
k k ,
k = 1, . . . , r,
(5)
r
k=1
Wk k k ,
k = 1, . . . , r.
(6)
N
k=1
k k ,
k = 1, . . . , N.
(7)
Determining an optimal r is not needed since the GA considers vectors 1 , 2 , . . . , N as the fittest individuals for the
complete space dimension N. This process can be regarded
as the mapping transform, T , of (3).
3.
Solution representation
817
(8)
q
,
1 (1 q)P
(9)
where
q =
f (Gen) = u2
Gen
1
Genmax
))t
(11)
3.3.1 Crossover
3.4.
3.3.2 Mutation
Mutation operators tend to make small random changes in
an attempt to explore all regions of the solution space [16].
The principle of a nonuniform mutation used in our application consists of randomly selecting one component, xk , of
an individual and setting it equal to a nonuniform random
number,1 xk :
x + b x f (Gen)
k
k
k
xk =
xk ak + xk f (Gen)
1 Otherwise,
if u1 < 0.5,
if u1 0.5,
(10)
Evaluation function
d C, C =
N
Ck C k
l
)1/l
(12)
k=1
818
0
0.5
1
1.5
2
2.5
3
4
100
200
3.5
300
100
0.5
0.5
1.5
2
2
2.5
100
300
1.5
2.5
3.5
200
Generation
Generation
200
300
Generation
3.5
100
200
300
Generation
Figure 2: Evolution of the performances of the best individual during 300 generations. Only the four first axes are considered among the
twelve.
EXPERIMENTS
819
Parameter value
Number of generations
Population size
Probability of selecting the best, q
Heuristic crossover rate
Multi-nonuniform mutation rate
Number of runs
Number of frames
Boundaries [ai , bi ]
300
150
0.08
0.25
0.06
50
114331
[1.0, +1.0]
90
80
80
70
70
% Recognition rate
% Recognition rate
820
60
50
60
50
40
40
30
10
SNR (dB)
15
30
20
10
SNR (dB)
15
20
15
20
Baseline
KLT
(a) 1-mixture.
(b) 2-mixture.
90
90
80
80
% Recognition rate
% Recognition rate
KLT-GA
Baseline
KLT-GA
KLT
70
60
50
40
70
60
50
40
30
10
SNR (dB)
15
20
Baseline
KLT-GA
30
10
SNR (dB)
Baseline
KLT-GA
KLT
KLT
(c) 4-mixture.
(d) 8-mixture.
Figure 3: Percent word recognition performance (%CWrd ) of the KLT- and KLT-GA-based CSR systems compared to the baseline HTK
method (noisy MFCC) using (a) 1-mixture, (b) 2-mixture, (c) 4-mixture, and (d) 8-mixture triphones for dierent values of SNR.
821
10
20
10
Second MFCC
First MFCC
10
20
10
30
20
50
100
150
Frame number
200
250
50
100
150
Frame number
200
250
50
100
150
Frame number
200
250
20
10
Fourth MFCC
Third MFCC
10
10
10
20
20
30
50
100
150
Frame number
200
250
Figure 4: Comparison between clean, noisy, and enhanced MFCCs represented by solid, dotted, dashed-dotted lines, respectively.
CONCLUSION
822
[6]
[7]
82.71
4.27
33.44
13.02
KLT-MFCC D A
77.05
KLT-GA-MFCC D A 54.48
5.11
5.42
30.04
25.42
17.84
40.10
[8]
[9]
(b) %CWrd using 2-mixture triphone models.
[10]
81.25
78.11
3.44
3.81
38.44
48.89
15.31
18.08
KLT-GA-MFCC D A 52.40
4.27
52.40
43.33
[11]
[12]
[13]
3.75
38.23
17.40
[14]
KLT-MFCC D A
76.27
KLT-GA-MFCC D A 49.69
4.88
5.62
39.54
25.31
18.85
44.69
[15]
MFCC D A
78.02
77.36
3.96
5.37
40.83
34.62
18.02
17.32
KLT-GA-MFCC D A 48.41
6.56
26.46
45.00
[16]
[17]
[18]
a large amount of data in order to find the best individual. Many other directions remain open for further work.
Present goals include analyzing evolved genetic parameters,
evaluating how performance scales with other types of noise
(nonstationary, limited band, etc.).
[19]
[20]
REFERENCES
[1] Y. Gong, Speech recognition in noisy environments: A survey, Speech Communication, vol. 16, no. 3, pp. 261291, 1995.
[2] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoustics, Speech, and Signal
Processing, vol. 27, no. 2, pp. 113120, 1979.
[3] D. Mansour and B. H. Juang, A family of distortion measures
based upon projection operation for robust speech recognition, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.
37, no. 11, pp. 16591671, 1989.
[4] S. B. Davis and P. Mermelstein, Comparison of parametric
representation for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoustics, Speech, and
Signal Processing, vol. 28, no. 4, pp. 357366, 1980.
[5] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, RASTAPLP speech analysis technique, in Proc. IEEE Int. Conf. Acous-
[21]
[22]
[23]
[24]
tics, Speech, Signal Processing, vol. 1, pp. 121124, San Fransisco, Calif, USA, March 1992.
J. Hernando and C. Nadeu, A comparative study of parameters and distances for noisy speech recognition, in Proc. Eurospeech 91, pp. 9194, Genova, Italy, September 1991.
C. R. Reeves and S. J. Taylor, Selection of training data for
neural networks by a Genetic Algorithm, in Parallel Problem
Solving from Nature, pp. 633642, Springer-Verlag, Amsterdam, The Netherlands, September 1998.
A. Spalanzani, S.-A. Selouani, and H. Kabre, Evolutionary
algorithms for optimizing speech data projection, in Genetic
and Evolutionary Computation Conference, p. 1799, Orlando,
Fla, USA, July 1999.
D. OShaughnessy, Speech Communications: Human and Machine, IEEE Press, Piscataway, NJ, USA, 2nd edition, 2000.
Y. Ephraim and H. L. Van Trees, A signal subspace approach
for speech enhancement, IEEE Trans. Speech, and Audio Processing, vol. 3, no. 4, pp. 251266, 1995.
D. E. Goldberg, Genetic Algorithms in Search, Optimization
and Machine Learning, Addison-Wesley, Reading, Mass, USA,
1989.
J. Holland, Adaptation in Natural and Artificial Systems, The
University of Michigan Press, Ann Arbor, Mich, USA, 1975.
L. B. Booker, D. E. Goldberg, and J. H. Holland, Classifier
systems and genetic algorithms, Artificial Intelligence, vol. 40,
no. 1-3, pp. 235282, 1989.
Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, AI series. Springer-Verlag, New York, NY, USA,
1992.
C. R. Houk, J. A. Joines, and M. G. Kay, A genetic algorithm for function optimization: a Matlab implementation,
Tech. Rep. 95-09, North Carolina State University, Raleigh,
NC, USA, 1995.
L. Davis, Ed., The Genetic Algorithm Handbook, chapter 17,
Van Nostrand Reinhold, New York, NY, USA, 1991.
B. H. Juang, L. R. Rabiner, and J. G. Wilpon, On the use
of bandpass liftering in speech recognition, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing, pp. 765768,
Tokyo, Japan, April 1986.
W. M. Fisher, G. R. Doddington, and K. M. Goudie-Marshall,
The DARPA speech recognition research database: specifications and status, in Proc. DARPA Speech Recognition Workshop, pp. 9399, Palo Alto, Calif, USA, February 1986.
C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz,
NTIMIT: A phonetically balanced, continuous speech telephone bandwidth speech database, in Proc. IEEE Int. Conf.
Acoustics, Speech, Signal Processing, vol. 1, pp. 109112, Albuquerque, NM, USA, April 1990.
P. J. Moreno and R. M. Stern, Sources of degradation of
speech recognition in the telephone network, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 109
112, Adelaide, Australia, April 1994.
X. D. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, K. F. Lee, and
R. Rosenfeld, The SPHINX-II speech recognition system: An
overview, Computer, Speech and Language, vol. 7, no. 2, pp.
137148, 1993.
Cambridge University Speech Group, The HTK Book (Version
2.1.1), Cambridge University Group, March 1997.
L. R. Bahl, P. V. de Souza, P. S. Gopalakrishnan, D. Nahamoo,
and M. A. Picheny, Decision trees for phonological rules in
continuous speech, in Proc. IEEE Int. Conf. Acoustics, Speech,
Signal Processing, pp. 185188, Toronto, Canada, May 1991.
W. D. Gaylor, Telephone Voice Transmission. Standards and
Measurements, Prentice-Hall, Englewood Clis, NJ, USA,
1989.
823
David M. Holloway
Mathematics Department, British Columbia Institute of Technology, Burnaby, British Columbia, Canada V5G 3H2
Chemistry Department, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z1
Email: david holloway@bcit.ca
Received 10 July 2002 and in revised form 1 December 2002
Understanding how genetic networks act in embryonic development requires a detailed and statistically significant dataset integrating diverse observational results. The fruit fly (Drosophila melanogaster) is used as a model organism for studying developmental genetics. In recent years, several laboratories have systematically gathered confocal microscopy images of patterns of
activity (expression) for genes governing early Drosophila development. Due to both the high variability between fruit fly embryos
and diverse sources of observational errors, some new nontrivial procedures for processing and integrating the raw observations
are required. Here we describe processing techniques based on genetic algorithms and discuss their ecacy in decreasing observational errors and illuminating the natural variability in gene expression patterns. The specific developmental problem studied is
anteroposterior specification of the body plan.
Keywords and phrases: image processing, elastic deformations, genetic algorithms, observational errors, variability, fluctuations.
1.
INTRODUCTION
825
(a)
(a)
(b)
(b)
Sources of variability in our images can be roughly subdivided into natural embryo variability in size and shape, natural expression pattern variability, errors of image processing
procedures, experimental errors (fixation, dyeing), observational errors (confocal scanning), and the molecular noise of
expression machinery.
Figure 2: Embryos of the same time class and the same length
have dierent expression patterns. Eve stripes dier in spacing and
overall domain along the anteroposterior (AP, x-) axis, and show
stripe curvature in the dorsoventral (DV, y-) direction.
2.3.
Scanning error
After the above processing, images still have variability in fluorescence intensity due to experimental conditions. With image processing, we can address experimental or observational
826
Fluorescence
200
Stripe
straightening
150
100
50
0
Registration
,%
axis
DV-
50
100
80
60
40
20
Figure 3: An example of the systematic DV distortion of an expression surface, with the gene Kruppel.
errors which have a systematic character. Due to the ellipsoidal geometry of the egg, nuclei in the center of the image
(along the AP axis) are closer to the microscope objective and
look brighter than nuclei at the top and bottom of the image.
Intensity shows a DV dependence (Figure 3). The brightness
depends (roughly) quadratically on DV distance from the AP
midline. We flatten this DV bias by a procedure of expression
surface stretching.
Figure 4 summarizes the three steps of image processing
which follow the scaling: stripe straightening, stripe registration, and expression surface stretching. The details of the
processing techniques are in Section 3.
After image processing, we can generate an integrated
dataset and begin to address questions regarding the segmentation patterning dynamics. We are pursuing two problems initially. First, we are visualizing the maturation of the
expression patterns for all segmentation genes over the patterning period. Second, since we have removed many of the
sources of variability in the images, what remains should be
largely indicative of intrinsic, molecular scale fluctuations in
protein concentrations. We are comparing relative noise levels within the segmentation signaling hierarchy. These are
some of the first tests of theoretical predictions for noise
propagation in segmentation signaling [7, 8]. In general,
both of these approaches should provide tests of existing theories for segment patterning.
3.
METHODS
Stretching
Figure 4: Steps for processing large sets of images to obtain an integrated dataset of segmentation pattern dynamics (a pair of images
used in this example). Stripe straightening minimizes the DV contribution to the AP patterning. Stripe registration minimizes the
variability in AP stripe positioning. Expression surface stretching
minimizes systematic observational errors in the DV direction.
827
Complete registration is achieved by sequential application of the polynomial transformations (1) and (2) to pairs of
images. Complete registration within each time class relative
to a starting image (the time class exemplar) gives sets of images suitable for constructing integrated datasets. If we then
compare results across time classes, we are able to visualize
detailed pattern dynamics over cell cycle 14.
The starting images in each time class, the time class exemplars, were chosen using the following way: the distance
between each (stripe-straightened) image and every other
(stripe-straightened) image in a time class was calculated
using the registration cost function (see Section 3.3). These
costs were summed for each image and the image with the
lowest total cost was used as the starting image. All other images in the time class were registered to this image. The starting image was unaected by the registration transformation
[6].
We perform (fluorescence intensity) surface stretching to
decrease DV distortion using the following polynomial:
Z = Z + C1 Y + C2 Y 2 + C3 XY + C4 Y 3 + C5 XY 2 + C6 X 2 Y, (3)
(1)
where x = w w0 , y = h h0 , w and h are initial spatial coordinates, and w0 , h0 , A, B, C, and D are parameters.
The y-coordinate remains the same while the x-coordinate is
transformed as a function of both coordinates w and h (for
details, see [6, 15, 16]). The parameters w0 , h0 , A, B, C, and
D for each image are found by means of GAs.
Our pairwise image registration procedure is the next
step in the sequential transformation of the x-coordinate. We
use the following polynomial for x :
x = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + c5 x5 ,
(2)
Optimization by GAs
D-V axis
828
90
80
70
60
50
40
30
20
10
4.
A-P axis
As discussed in the introduction, fluorescence intensity measurements demonstrate high variability and are subject to diverse observational and experimental errors. Our aim with
the image processing is to decrease some of the observational
and experimental errors and help distinguish these from the
natural variability which we would like to study (i.e., characterization of the stochastic nature of molecular processes in
this gene network). We will discuss the ecacy of the image
processing by comparison of initial and residual variability in
our data.
4.1.
3.3.4 Implementation
4.2.
Z j Z j+1
2
2Z j Z j+1 Z j 1
2
(4)
.
829
250
Fluorescence
250
200
150
100
50
0
150
100
p osi
tion
,%
EL
65
60
55
50
45
40
35
30
50
D-V
Fluorescence
200
30
40
50
60
70
80
A-P position in % of egg length
30
90
40
50
60
70
AP position (% egg length)
(a)
80
90
(b)
250
Fluorescence
200
Fluorescence
300
250
200
150
100
50
150
100
50
0
50
11
21
31
41
51
61
71
81
0
30
91
40
50
60
70
AP position (% egg length)
(c)
80
(d)
300
Fluorescence
250
200
150
100
50
0
50
21
41
61
81
Figure 7: Superposition of about a hundred images for eve gene expression from time class 8 (late cycle 14). (a) Superposition of all
eve expression surfaces after the stripe straightening and registration. (b) Variability of expression profiles for gene eve after the stripestraightening procedure. (c) Mean intensity at each AP position, with standard deviation error bars for the expression profiles from (b). (d)
Residual variability for the same dataset after stripe straightening and registration. (e) Mean intensity with standard deviation error bars for
the expression profiles from (d). These have decreased significantly with stripe registration. Data for the 1D profiles is extracted from 10%
(DV) longitudinal strips (e.g., Figure 6, center strip). Cubic spline interpolation was used to display discrete data.
expression surface for such an embryo looks like a half ellipsoid (Figures 8a and 8b). The fluorescence level at the edges
of the image is about 20 arbitrary units, while in the center it
is about 60 units. (The expression surface follows the geometry of the embryo as illustrated in Figure 1b.) Even in eve null
mutants, background fluorescence shows this distortion.
830
100
80
80
60
60
40 80
60
40
20
40 80
60
40
20
(X, Y, Z)
20
40
60
80
(X, Y, Z)
20
40
(a)
60
40
40
20
20
20
40
80
(b)
60
0
40 0
60
80
(X, Y, Z)
60
60
80
(c)
0
40 0
60
80
(X, Y, Z)
20
40
60
80
(d)
Figure 8: Surface stretching transformation. (a) and (b) Experimental expression surface and scatter plot, for a truly uniform distribution
of the eve gene product. (c) and (d) Expression surface and scatter plot after surface stretching, minimizing the systematic errors in intensity
data.
The stretching procedure transforms the expression surface along the DV, y-axis (Figures 8c and 8d). Minimizing
the systematic observational error in this direction gives us a
chance to directly observe nucleus-to-nucleus variability in a
single embryo (Figure 8c).
5.
We have found heuristic optimization procedures (transformations (1), (2), and (3)) to be a simple and eective way to
reduce observational errors in embryo images. This reduction of variability allows us to focus on the variability intrin-
831
gt
200
150
100
60
50
30
40
50
60
A-P position
70
80
20
90
D-V
p osit
50
40
30
ion
Fluorescence
250
hb
kni
123 4 5 6 7
eve
1 2 3 4 5 6 7
hairy
late cycle pattern is well covered in the literature, but the details of the early dynamics are not so well characterized.
All five genes show a movement towards the middle of
the embryo, with anterior expression domains moving posteriorly and posterior domains moving anteriorly. In more
detail, the small anterior domain of knirps (white arrowhead)
appears to move posteriorly at the same speed as eve stripe 1
(also marked by white arrowhead). It appears that we can see
interactions between hb and gt in the posterior: a posterior
gt peak forms first, but as posterior hb forms, the gt peak
moves anteriorly. This interaction appears to be reflected in
the movement of stripe 7 of eve and h (black arrowheads).
We hope that further study of the correlation between expression domains over cycle 14 and observation of the fine
gene-specific details of domain dynamics will serve to test
theories of pattern formation in Drosophila segmentation.
832
Fluorescence
200
150
100
50
0
0
20
40
60
AP position (% egg length)
80
100
(a)
250
Fluorescence
200
150
ACKNOWLEDGMENT
The work of AS is supported by USA National Institutes of
Health, Grant RO1-RR07801, INTAS Grant 97-30950, and
RFBR Grant 00-04-48515.
100
50
0
20
40
60
AP position (% egg length)
80
100
(b)
Figure 11: Eve and bcd fluorescence scatterplots and profiles (early
cycle 14, time class 1), sampled from a 50% DV longitudinal strip.
(a) Scatterplots after stripe straightening and surface stretching.
Each dot is the intensity for a single nucleus. (b) Curves of mean
intensity at each AP position, with standard deviation error bars.
REFERENCES
[1] M. Akam, The molecular basis for metameric pattern in the
Drosophila embryo, Development, vol. 101, no. 1, pp. 122,
1987.
[2] P. A. Lawrence, The Making of a Fly, Blackwell Scientific Publications, Oxford, UK, 1992.
[3] B. Houchmandzadeh, E. Wieschaus, and E. Leibler, Establishment of developmental precision and proportions in the
early Drosophila embryo, Nature, vol. 415, no. 6873, pp. 798
802, 2002.
[4] M. Keijzer, J. J. Merelo, G. Romero, and M. Schoenauer,
Evolving objects: a general purpose evolutionary computation library, in Proc. 5th Conference on Artificial Evolution
(EA-2001), P. Collet, C. Fonlupt, J.-K. Hao, E. Lutton, and
M. Schoenauer, Eds., number 2310 in Springer-Verlag Lecture
Notes in Computer Science, pp. 231244, Springer-Verlag, Le
Creusot, France, 2001.
[5] J. Rasure and M. Young, An open environment for image
processing software development, in Proceedings of 1992
SPIE/IS&T Symposium on Electronic Imaging, vol. 1659 of
SPIE Proceedings, pp. 300310, San Jose, Calif, USA, February 1992.
[6] A. V. Spirov, A. B. Kazansky, D. L. Timakin, J. Reinitz,
and D. Kosman, Reconstruction of the dynamics of the
Drosophila genes from sets of images sharing a common pattern, Journal of Real-Time Imaging, vol. 8, pp. 507518, 2002.
[7] D. Holloway, J. Reinitz, A. V. Spirov, and C. E. VanarioAlonso, Sharp borders from fuzzy gradients, Trends in Genetics, vol. 18, no. 8, pp. 385387, 2002.
[8] T. C. Lacalli and L. G. Harrison, From gradients to segments:
models for pattern formation in early Drosophila embryogenesis, Semin. Dev. Biol., vol. 2, pp. 107117, 1991.
833
Alexander Spirov is an Adjunct Associate
Professor in the Department of Applied
Mathematics and Statistics and the Center for Developmental Genetics at the State
University of New York at Stony Brook,
Stony Brook, New York. Dr. Spirov was born
in St. Petersburg, Russia. He received M.S.
degree in molecular biology in 1978 from
the St. Petersburg State University, St. Petersburg, Russia. He received his Ph.D. in
the area of biometrics in 1987 from the Irkutsk State University,
Irkutsk, Russia. His research interests are in computational biology and bioinformatics, web databases, data mining, artificial intelligence, evolutionary computations, animates, artificial life, and
evolutionary biology. He has published about 80 publications in
these areas.
David M. Holloway is an instructor of
mathematics at the British Columbia Institute of Technology and a Research Associate
in chemistry at the University of British
Columbia, Vancouver, Canada. His research
is focused on the formation of spatial pattern in developmental biology (embryology) in animals and plants. Topics include
the establishment and maintenance of differentiation states, coupling between chemical pattern and tissue growth for the generation of shape, and the
eects of molecular noise on spatial precision. This work is chiefly
computational (the solution of partial dierential equation models
for developmental phenomena), but also includes data analysis for
body segmentation in the fruit fly. He received his Ph.D. in physical
chemistry from the University of British Columbia in 1995, and did
postdoctoral fellowships there and at the University of Copenhagen
and Simon Fraser University.
Stuart J. Flockton
Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, UK
Email: s.flockton@rhul.ac.uk
Received 28 June 2002 and in revised form 29 November 2002
A comparison is made of the behaviour of some evolutionary algorithms in time-varying adaptive recursive filter systems. Simulations show that an algorithm including random immigrants outperforms a more conventional algorithm using the breeder
genetic algorithm as the mutation operator when the time variation is discontinuous, but neither algorithms performs well when
the time variation is rapid but smooth. To meet this deficit, a new hybrid algorithm which uses a hill climber as an additional
genetic operator, applied for several steps at each generation, is introduced. A comparison is made of the eect of applying the
hill climbing operator a few times to all members of the population or a larger number of times solely to the best individual; it is
found that applying to the whole population yields the better results, substantially improved compared with those obtained using
earlier methods.
Keywords and phrases: recursive filters, evolutionary algorithms, tracking.
1.
INTRODUCTION
Many problems in signal processing may be viewed as system identification. A block diagram of a typical system identification configuration is shown in Figure 1. The information available to the user is typically the input and the noisecorrupted output signals, x(n) and a(n), respectively, and
the aim is to identify the properties of the unknown system by, for example, putting an adaptive filter of a suitable
structure in parallel to the unknown system and altering the
parameters of this filter to minimise the error signal (n).
When the nature of the unknown system requires pole-zero
modelling, there is a diculty in adjusting the parameters
of the adaptive filter, as the mean square error (MSE) is a
nonquadratic function of the recursive filter coecients, so
the error surface of such a filter may have local minima as
well as the global minimum that is being sought. The ability
of evolutionary algorithms (EAs) to find global minima of
multimodal functions has led to their application in this area
[1, 2, 3, 4].
All these authors have considered only time-invariant
unknown systems. However in many real-life applications,
time variations are an ever-present feature. In noise or echo
cancellation, for example, the unknown system represents
835
Noise, w(n)
Unknown system
H(z)
x(n)
y(n) +
+
a(n)
Adaptive filter
H(z)
y (n)
Error, (n)
2
N 1
z1
2
N 1
N
Input, x(n)
z1
N 1
z1
2
z1
1
0
y(n)
2.
The standard genetic algorithm (GA), with its strong selection policy and low rate of mutation, quickly eliminates diversity from the population as it proceeds. In typical function
optimization applications, where the environment remains
static, we are not usually concerned with the population diversity at later stages of the search, so long as the best or mean
value of the population fitness is somewhere near to an acceptable value. However, when the function to be optimized
is nonstationary, the standard GA runs into considerable
problems once the population has substantially converged
on a particular region of the search space. At this point, the
GA is eectively reliant on the small number of random mutations, occurring each generation, to somehow redirect its
search to regions of higher fitness since standard crossover
operators are ineective when the population has become
largely homogeneous. This view is borne out by Pettits and
Swiggers study [10] in which a Holland-type GA was compared to cognitive (statistical predictive) and random pointmutation models in a stochastically fluctuating environment.
In all cases, the GA performed poorly in tracking the changing environment even when the rate of fluctuation was slow.
An approach to providing EAs capable of functioning well in
time-varying systems is the mutation-based strategy adopted
by Cobb and Grefenstette [5, 6, 7]. In this approach, population diversity is sustained either by replacing a proportion of
the standard GAs population with randomly generated individuals, the random immigrants strategy, or by increasing
the mutation rate when the performance of the GA degrades
(triggered hypermutation). Cobbs hypermutation operator is
adaptive, briefly increasing the mutation rate when it detects
that a degradation of performance (measured as a running
average of the best performing population members over five
generations) has occurred. However, it is easy to contrive categories of environmental change which would not trigger the
hypermutable state. On continuously changing functions,
the hypermutation GA has a greater variance in its tracking
performance than either the standard or random immigrants
GA. In oscillating environments, where the changes are more
drastic, the high mutation level of the hypermutation GA
destroys much of the information contained in the current
One of the main diculties encountered in recursive adaptive systems is the fact that the system can become unstable
if the coecients are unconstrained. With many filter structures, it is not immediately obvious whether any particular
set of coecients will result in the presence of a pole outside the unit circle, and hence instability. On the other hand,
it is important that the adaptive algorithm is able to cover
the entire stable coecient space, so it is desirable to adopt
a structure which will make this possible at the same time as
making stability monitoring easy. It is for this reason that the
pole-zero lattice filter [11] was adopted for this work. A block
diagram of the filter structure is given in Figure 2.
The input-output relation of the filter is given by
y(n) =
N
i (n)Bi (n),
(1)
i=0
where Fi (n) and Bi (n) are the forward and backward residuals denoted by
Bi (n) = Bi1 (n) + i (n)Fi (n),
i = 1, 2, . . . , N,
i = N, . . . , 1,
FN (n) = x(N).
(2)
It can be shown that a necessary and sucient condition for all of the roots of the pole polynomial to lie within
the unit circle is |ki | < 1, i = 1, . . . , N, so the stability of
candidate models can be guaranteed merely by restricting
the range over which the feedback coecients are allowed
to vary. Since this must be done when implementing the GA
anyway, the ability to maintain filter stability is essentially obtained without cost.
3.1.
836
E t(n)2
min (n)
(3)
(4)
d(n) 1.
(5)
but otherwise,
In this section, the performance of two genetic adaptive algorithms operating in a variety of nonstationary environments
837
10
2.0
1.0
a1
1.0
10
NMSE (dB)
0.0
2.0
20
200
400
600
800
1000
200
400
600
800
1000
200
400
600
Generations
800
1000
2.0
1.0
a2
30
0.0
1.0
40
200
400
600
Generations
800
1.0
1000
0.0
a3
50
2.0
Standard GA
Random immigrants GA
2.0
3.0
1.0
Standard GA
Random immigrants GA
True value of the coecient
838
the local variations in the parameters. In this way, the two
major failings of the individual components of the hybrid
can be addressed. The GA is often capable of finding reasonable solutions to quite dicult problems but its characteristic slow finishing is legendary. Conversely, the huge array of
gradient-based and gradientless local search techniques run
the risk of becoming hopelessly entangled in local optima. In
combining these two methodologies, the hybrid GA has been
shown to produce improvements in performance over the
constituent search techniques in certain problem domains
[17, 18, 19, 20].
Goldberg [15, page 202] discusses a number of ways in
which local search and GAs may be hybridized. In one configuration, the hybrid is described in terms of a batch scheme.
The GA is run long enough for the population to become
largely homogeneous. At this point, the local optimization
procedure takes over and continues the search, from perhaps the best 5 or 10% of solutions in the population, until improvement is no longer possible. This method allows
the GA to determine the gross features of the solution space,
hopefully resulting in convergence to the basin of attraction
around the global optimum, before switching to a technique
better suited to fine tuning of the solutions. An alternative
approach is to embed the local search within the framework
of the GA, treating it rather like another genetic operator.
This is the scheme adopted by Kido et al. [18] (who combine GA, simulated annealing, and TABU search), Bersini
and Renders [20] (whose GA incorporates a hill-climbing
operator), and Miller et al. [19] (who employ a variety of
problem-specific local improvement operators). This second
hybrid configuration is better suited to the identification of
time-varying systems. In this case, the local search heuristic
is embedded within the framework of the EA and is treated
as another genetic operator. The local optimization scheme is
enabled for a certain number of iterations at regular intervals
in the GA run.
The hybrid approach utilizes a random hill-climbing
technique to perform periodic local optimization. This procedure is ideally suited to incorporation in the EA since it
does not require calculation of gradients or any other auxiliary information. Instead, the same evaluation function
can be employed to determine the merit of the newly sampled points in the coecient space. Since the technique is
greedy, the locally optimized solution is always at least as
good as its genetic predecessor. In addition, once a change
in the unknown system has occurred and is detected by a
degradation of the models performance, no new data samples are required. The hill-climbing method incorporated
here into the GA is the random search technique proposed
by Solis and Wets [21]. This algorithm randomly generates a new search point from a uniform distribution centred about the current coecient set. The standard deviation of the distribution k is expanded or contracted in
relation to the success of the algorithm in locating better
performing models. If the first-chosen new point is not an
improvement on the original point, the algorithm tests another point the same distance away in exactly the opposite
direction.
839
2.0
2.0
1.0
1.0
a1
a1
0.0
200
400
600
800
2.0
1000
2.0
2.0
1.0
1.0
0.0
0.0
a2
a2
2.0
1.0
2.0
200
400
600
800
2.0
1000
200
400
600
800
1000
200
400
600
800
1000
200
400
600
Generations
800
1000
1.0
0.0
1.0
a3
0.0
1.0
2.0
3.0
1.0
1.0
a3
0.0
1.0
1.0
2.0
200
400
600
Generations
800
3.0
1000
CONCLUSIONS
840
when the time variations were rapid and continuous (d > 1).
In the final section of the paper, a hybrid scheme is introduced and shown to be more eective than either of the earlier schemes for tracking these rapid variations.
REFERENCES
[1] D. M. Etter, M. J. Hicks, and K. H. Cho, Recursive
adaptive filter design using an adaptive genetic algorithm,
in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing
(ICASSP 82), vol. 2, pp. 635638, IEEE, Paris, France, May
1982.
[2] R. Nambiar, C. K. K. Tang, and P. Mars, Genetic and learning
automata algorithms for adaptive digital filters, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 92), pp.
4144, IEEE, San Francisco, Calif, USA, March 1992.
[3] K. Kristinsson and G. A. Dumont, System identification and
control using genetic algorithms, IEEE Trans. Systems, Man,
and Cybernetics, vol. 22, no. 5, pp. 10331046, 1992.
[4] M. S. White and S. J. Flockton, Adaptive recursive filtering
using evolutionary algorithms, in Evolutionary Algorithms
in Engineering Applications, D. Dasgupta and Z. Michalewicz,
Eds., pp. 361376, Springer-Verlag, Berlin, Germany, 1997.
[5] H. G. Cobb, An investigation into the use of hypermutation as an adaptive operator in genetic algorithms having continuous, time-dependent nonstationary environments, Tech.
Rep. 6760, Navy Center for Applied Research in Artificial Intelligence, Washington, DC, USA, December 1990.
[6] J. J. Grefenstette, Genetic algorithms for changing environments, in Proc. 2nd International Conference on Parallel Problem Solving from Nature (PPSN II), R. Manner and B. Manderick, Eds., pp. 137144, Elsevier, Amsterdam, September 1992.
[7] H. G. Cobb and J. J. Grefenstette, Genetic algorithms for
tracking changing environments, in Proc. 5th International
Conference on Genetic Algorithms (ICGA 93), S. Forrest, Ed.,
pp. 523530, Morgan Kaufmann, San Mateo, CA, USA, July
1993.
[8] A. Neubauer, A comparative study of evolutionary algorithms for on-line parameter tracking, in Proc. 4th International Conference on Parallel Problem Solving from Nature
(PPSN IV), H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.P. Schwefel, Eds., pp. 624633, Springer-Verlag, Berlin, Germany, September 1996.
[9] F. Vavak, T. C. Fogarty, and K. Jukes, A genetic algorithm
with variable range of local search for tracking changing environments, in Proc. 4th International Conference on Parallel Problem Solving from Nature (PPSN IV), H.-M. Voigt,
W. Ebeling, I. Rechenberg, and H.-P. Schwefel, Eds., pp. 376
385, Springer-Verlag, Berlin, Germany, September 1996.
[10] E. Pettit and K. M. Swigger, An analysis of genetic-based pattern tracking and cognitive-based component tracking models of adaptation, in Proc. National Conference on Artificial
Intelligence (AAAI 83), pp. 327332, Morgan Kaufmann, San
Mateo, CA, USA, August 1983.
[11] A. H. Gray Jr. and J. D. Markel, Digital lattice and ladder filter
synthesis, IEEE Transactions on Audio and Electroacoustics,
vol. 21, no. 6, pp. 491500, 1973.
[12] O. Macchi, Adaptive Processing: The Least Mean Squares Approach with Applications in Transmission, John Wiley & Sons,
Chichester, UK, 1995.
[13] O. Macchi, N. Bershad, and M. Mboup, Steady-state superiority of LMS over LS for time-varying line enhancer in noisy
environment, IEE Proceedings F, vol. 138, no. 4, pp. 354360,
1991.
Victor B. Ciesielski
School of Computer Science and Information Technology, RMIT University, GPO Box 2476v Melbourne, 3001 Victoria, Australia
Email: vc@cs.rmit.edu.au
Peter Andreae
School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
Email: pondy@mcs.vuw.ac.nz
Received 30 June 2002 and in revised form 7 March 2003
This paper describes a domain-independent approach to the use of genetic programming for object detection problems in which
the locations of small objects of multiple classes in large images must be found. The evolved program is scanned over the large
images to locate the objects of interest. The paper develops three terminal sets based on domain-independent pixel statistics
and considers two dierent function sets. The fitness function is based on the detection rate and the false alarm rate. We have
tested the method on three object detection problems of increasing diculty. This work not only extends genetic programming
to multiclass-object detection problems, but also shows how to use a single evolved genetic program for both object classification
and localisation. The object classification map developed in this approach can be used as a general classification strategy in genetic
programming for multiple-class classification problems.
Keywords and phrases: machine learning, neural networks, genetic algorithms, object recognition, target detection, computer
vision.
1.
INTRODUCTION
842
multiple independent stages, such as preprocessing, edge detection, segmentation, feature extraction, and object classification [6, 7], which often results in some eciency and eectiveness problems. The final results rely too much upon the
results of earlier stages. If some objects are lost in one of the
early stages, it is very dicult or impossible to recover them
in the later stage. To avoid these disadvantages, this paper introduces a single-stage approach.
There have been a number of reports on the use of genetic programming (GP) in object detection and classification [8, 9]. Winkeler and Manjunath [10] describe a GP
system for object detection in which the evolved functions
operate directly on the pixel values. Teller and Veloso [11]
describe a GP system and a face recognition application in
which the evolved programs have a local indexed memory.
All of these approaches are based on detecting one class of
objects or two-class classification problems, that is, objects
versus everything else. GP naturally lends itself to binary
problems as a program output of less than 0 can be interpreted as one class and greater than or equal to 0 as the other
class. It is not obvious how to use GP for more than two
classes. The approach in this paper will focus on object detection problems in which a number of objects in more than
two classes of interest need to be localised and classified.
1.1. Outline of the approach to object detection
A brief outline of the method is as follows.
(1) Assemble a database of images in which the locations
and classes of all of the objects of interest are manually
determined. Split these images into a training set and
a test set.
(2) Determine an appropriate size (n n) of a square
which will cover all single objects of interest to form
the input field.
(3) Invoke an evolutionary process with images in the
training set to generate a program which can determine the class of an object in its input field.
(4) Apply the generated program as a moving window
template to the images in the test set and obtain the
locations of all the objects of interest in each class. Calculate the detection rate (DR) and the false alarm rate
(FAR) on the test set as the measure of performance.
1.2. Goals
The overall goal of this paper is to investigate a learning/adaptive, single-stage, and domain-independent approach to multiple-class object detection problems without
any preprocessing, segmentation, and specific feature extraction. This approach is based on a GP technique. Rather
than using specific image features, pixel statistics are used
as inputs to the evolved programs. Specifically, the following
questions will be explored on a sequence of detection problems of increasing diculty to determine the strengths and
limitations of the method.
(i) What image features involving pixels and pixel statistics would make useful terminals?
Structure
LITERATURE REVIEW
2.1.
Object detection
Source
databases
843
Preprocessing
Segmentation
Feature
extraction
Classification
(1)
(2)
(3)
(4)
gle class. One special case in this category is that there is only
one object of interest in each source image. In nature, these
problems contain a binary classification problem: object versus nonobject, also called object versus background. Examples
are detecting small targets in thermal infrared images [16]
and detecting a particular face in photograph images [20].
(ii) Multiple-class object detection problem, where there
are multiple object classes of interest, each of which has multiple objects in each image. Detection of handwritten digits
in zip code images [21] is an example of this kind.
It is possible to view a multiclass problem as series of binary problems. A problem with objects 3 classes of interest
can be implemented as class1 against everything else, class2
against everything else, and class 3 against everything else.
However, these are not independent detectors as some methods of dealing with situations when two detectors report an
object at the same location must be provided.
In general, multiple-class object detection problems are
more dicult than one-class detection problems. This paper
is focused on detecting multiple objects from a number of
classes in a set of images, which is particularly dicult. Most
research in object detection which has been done so far belongs to the one-class object detection problem.
2.2.
Performance evaluation
In this paper, we use the DR and FAR to measure the performance of multiclass object detection problems. The DR
refers to the number of small objects correctly reported by a
detection system as a percentage of the total number of actual objects in the image(s). The FAR, also called false alarms
per object or false alarms/object [16], refers to the number
of nonobjects incorrectly reported as objects by a detection
system as a percentage of the total number of actual objects
in the image(s). Note that the DR is between 0 and 100%,
while the FAR may be greater than 100% for dicult object
detection problems.
The main goal of object detection is to obtain a high DR
and a low FAR. There is, however, a trade-o between them
for a detection system. Trying to improve the DR often results
in an increase in the FAR, and vice versa. Detecting objects in
images with very cluttered backgrounds is an extremely difficult problem where FARs of 2002000% (i.e., the detection
system suggests that there are 20 times as many objects as
there really are) are common [5, 16].
Most research which has been done in this area so far only
presents the results of the classification stage (only the final
stage in Figure 1) and assumes that all other stages have been
properly done. However, the results presented in this paper
are the performance for the whole detection problem (both
the localisation and the classification).
844
2.3. Related workGP for object detection
Since the early 1990s, there has been only a small amount
of work on applying GP techniques to object classification,
object detection, and other vision problems. This, in part,
reflects the fact that GP is a relatively young discipline compared with, say, NNs.
2.3.1 Object classification
Tackett [9, 22] uses GP to assign detected image features to a
target or nontarget category. Seven primitive image features
and twenty statistical features are extracted and used as the
terminal set. The 4 standard arithmetic operators and a logic
function are used as the function set. The fitness function is
based on the classification result. The approach was tested
on US Army NVEOD Terrain Board imagery, where vehicles,
such as tanks, need to be classified. The GP method outperformed both an NN classifier and a binary tree classifier on
the same data, producing lower rates of false positives for the
same DRs.
Andre [23] uses GP to evolve functions that traverse an
image, calling upon coevolved detectors in the form of hitmiss matrices to guide the search. These hit-miss matrices
are evolved with a two-dimensional genetic algorithm. These
evolved functions are used to discriminate between two letters or to recognise single digits.
Koza in [24, Chapter 15] uses a turtle to walk over a
bitmap landscape. This bitmap is to be classified either as a
letter L, a letter I, or neither of them. The turtle has access to the values of the pixels in the bitmap by moving over
them and calling a detector primitive. The turtle uses a decision tree process, in conjunction with negative primitives, to
walk over the bitmap and decide which category a particular
landscape falls into. Using automatically defined functions as
local detectors and a constrained syntactic structure, some
perfect scoring classification programs were found. Further
experiments showed that detectors can be made for dierent
sizes and positions of letters, although each detector has to
be specialised to a given combination of these factors.
Teller and Veloso [11] use a GP method based on the
PADO language to perform face recognition tasks on a
database of face images in which the evolved programs have
a local indexed memory. The approach was tested on a
discrimination task between 5 classes of images [25] and
achieved up to 60% correct classification for images without
noise.
Robinson and McIlroy [26] apply GP techniques to the
problem of eye location in grey-level face images. The input data from the images is restricted to a 3000-pixel block
around the location of the eyes in the face image. This approach produced promising results over a very small training set, up to 100% true positive detection with no false positives, on a three-image training set. Over larger sets, the GP
approach performed less well however, and could not match
the performance of NN techniques.
Winkeler and Manjunath [10] produce genetic programs
to locate faces in images. Face samples are cut out and
scaled, then preprocessed for feature extraction. The statis-
Object detection
All of the reported GP-based object detection approaches belong to the one-class object detection category. In these detection problems, there is only one object class of interest in the
large images.
Howard et al. [19] present a GP approach to automatic
detection of ships in low-resolution synthetic aperture radar
imagery. A number of random integer/real constants and
pixel statistics are used as terminals. The 4 arithmetic operators and min and max operators constitute the function
set. The fitness is based on the number of the true positive
and false positive objects detected by the evolved program.
A two-stage evolution strategy was used in this approach. In
the first stage, GP evolved a detector that could correctly distinguish the target (ship) pixels from the nontarget (ocean)
pixels. The best detector was then applied to the entire image and produced a number of false alarms. In the second
stage, a brand new run of GP was tasked to discriminate between the clear targets and the false alarms as identified in the
first stage and another detector was generated. This two-stage
process resulted in two detectors that were then fused using
the min function. These two detectors return a real number,
which if greater than zero denotes a ship pixel, and if zero or
less denotes an ocean pixel. The approach was tested on images chosen from commercial SAR imagery, a set of 50 m and
100 m resolution images of the English Channel taken by the
European Remote Sensing satellite. One of the 100 m resolution images was used for training, two for validation, and two
for testing. The training was quite successful with perfect DR
and no false alarms, while there was only one false positive
in each of the two test images and the two validation images
which contained 22, 22, 48, and 41 true objects.
Isaka [27] uses GP to locate mouth corners in small
(50 40) images taken from images of faces. Processing each
pixel independently using an approach based on relative intensities of surrounding pixels, the GP approach was shown
to perform comparably to a template matching approach on
the same data.
A list of object detection related work based on GP is
shown in Table 1.
3.
3.1.
845
Applications
Authors
Tank detection
(classification)
Tackett
1993
[9]
Tackett
1994
[22]
Andre
1994
[23]
Koza
1994
[24]
1995
[11]
Letter recognition
Object classification
Face recognition
Small target classification
Object detection
Year
Source
1998
[28]
1997
[10]
Shape recognition
1995
[25]
Eye recognition
1995
[26]
Ship detection
Howard et al.
1999
[19]
Mouth detection
Isaka
1997
[27]
Benson
2000
[29]
Vehicle detection
Howard et al.
2002
[30]
Edge detection
Lucier et al.
1998
[31]
Koza
1992
[32]
Koza
1993
[33]
Howard et al.
2001
[34]
Poli
1996
[35]
Model interpretation
Lindblad et al.
2002
[36]
Stereoscopic vision
Graae et al.
2000
[37]
Image compression
1996
[38]
entire images in the training set to detect the objects of interest. In the test procedure, the best evolved genetic program
obtained in the learning process is then applied to the entire images in the test set to measure object detection performance.
The learning/evolutionary process in our GP approach is
summarised as follows.
(1) Initialise the population.
(2) Repeat until a termination criterion is satisfied.
(2.1) Evaluate the individual programs in the current
population. Assign a fitness to each program.
(2.2) Until the new population is fully created, repeat
the following:
(i) select programs in the current generation;
(ii) perform genetic operators on the selected
programs;
(iii) insert the result of the genetic operations
into the new generation.
(3) Present the best individual in the population as the
outputthe learned/evolved genetic program.
In this system, we used a tree-like program structure
to represent genetic programs. The ramped half-and-half
method was used for generating the programs in the initial
population and for the mutation operator. The proportional
selection mechanism and the reproduction, crossover, and
mutation operators were used in the learning process.
In the remainder of this section, we address the other aspects of the learning/evolutionary system: (1) determination
of the terminal set, (2) determination of the function set, (3)
development of a classification strategy, (4) construction of
the fitness measure, and (5) selection of the input parameters and determination of the termination strategy.
3.2.
For object detection problems, terminals generally correspond to image features. In our approach, we designed three
dierent terminal sets: local rectilinear features, circular features, and pixel features. In all these cases, the features are
statistical properties of regions of the image, and we refer to
them as pixel statistics.
3.2.1
846
Entire images
(detection training set)
GP learning/evolutionary process
Entire images
(detection test set)
General programs
Detection results
SD
F1
F3
F5
F7
F9
F11
F13
F15
F17
F19
F2
F4
F6
F8
F10
F12
F14
F16
F18
F20
Function set I
(1)
E1
847
B1
Squares:
n/2
A2
E2
B2
A1 B1 C1 D1 , A2 B2 C2 D2 ,
A1 E1 OG1 , E1 B1 H1 O,
G1
H2
G2
H1
G1 OF1 D1 , OH1 C1 F1
n/2
D2
F2
D1
F1
n/2
G1 H1 , E1 F1 , G2 H2 , E2 F2
C2
C1
n/2
n
Figure 3: The input field and the image regions and lines for feature selection in constructing terminals.
Features
Mean SD
F1
F2
F4
F3
F5
F6
..
..
.
.
F(2i+1) F(2i+2)
..
..
.
.
F(2n+1) F(2n+2)
C1 C2 Ci Cn
Local boundaries
Central pixel
Circular boundary C1
Circular boundary C2
..
.
Circular boundary Ci
..
.
Circular boundary Cn
Figure 4: The input field and the image boundaries for feature extraction in constructing terminals.
Function set II
(2)
The output of a genetic program in a standard GP system is a floating point number. Genetic programs can be
848
F11
F16
F
F14 F20
+ F5 +
+ F12 F14 (F9 F11 F1 F10 F9 F17 ) 5
F14
F11
F18
F6
(133.082 F17 )
F17 + (F11 + F12 ) F20 + F2 + 145.765
F11
+ (F6 F5 F3 F6 )
F11
F14 F20
used to perform one-class object detection tasks by utilising the division between negative and nonnegative numbers of a genetic program output. For example, negative
numbers can correspond to the background and nonnegative numbers to the objects in the (single) class of interest. This is similar to binary classification problems in standard GP where the division between negative and nonnegative numbers acts as a natural boundary for a distinction
between the two classes. Thus, genetic programs generated
by the standard GP evolutionary process primarily have the
ability to represent and process binary classification or oneclass object detection tasks. However, for the multiple-class
object detection problems described here, where more than
two classes of objects of interest are involved, the standard
GP classification strategy mentioned above cannot be applied.
In this approach, we develop a dierent strategy which
uses a program classification map, as shown in Figure 8, for
the multiple-class object detection problems. Based on the
output of an evolved genetic program, this map can identify
which class of the object located in the current input field belongs to. In this map, m refers to the number of object classes
of interest, v is the output value of the evolved program, and
T is a constant defined by the user, which plays a role of a
threshold.
3.5. The fitness function
Since the goal of object detection is to achieve both a high DR
and a low FAR, we should consider a multiobjective fitness
function in our GP system for multiple-class object detection
problems. In this approach, the fitness function is based on
(3)
849
v
background,
class 1,
class
2,
Class = ..
class i,
..
class m,
v < 0,
0 v T,
T v 2T,
..
.
(i 1) T v i T,
..
.
v i T,
(m 1) T
Class m
.
.
.
iT
Class i
.
.
.
T
Class 1
0
Background
Sweep programs
on training images
Match objects
Compute fitness
Search parameters
The search parameters used here include the number of individuals in the population (population-size), the maximum
depth of the randomly generated programs in the initial population (initial-max-depth), the maximum depth permitted
for programs resulting from crossover and mutation operations (max-depth), and the maximum generations the evolutionary process can run (max-generations). These parameters control the search space and when to stop the learning
process. In theory, the larger these parameters, the more the
chance of success. In practice, however, it is impossible to set
them very large due to the limitations of the hardware and
high cost of computation.
There is another search parameter, the size of the input
field (input-size), which decides the size of the moving window in which a genetic program is computed in the program
sweeping procedure.
3.6.2
Genetic parameters
The genetic parameters decide the number of genetic programs used/produced by dierent genetic operators in the
mating pool to produce new programs in the next generation. These parameters include the percentage of the best
individuals in the current population that are copied unchanged to the next generation (reproduction-rate), the percentage of individuals in the next generation that are to be
produced by crossover (cross-rate), the percentage of individuals in the next generation that are to be produced by mutation (mutation-rate = 100% reproduction-rate cross-rate),
the probability that, in a crossover operation, two terminals will be swapped (cross-term), and the probability that,
in a crossover operation, random subtrees will be swapped
(cross-func = 100% cross-term).
3.6.3
Fitness parameters
850
Parameter names
Easy images
Coin images
Retina images
Search parameters
Population-size
Initial-max-depth
Max-depth
Max-generations
Input-size
100
4
8
100
14 14
500
5
12
150
24 24
700
6
20
150
16 16
Genetic parameters
Reproduction-rate
Cross-rate
Mutation-rate
Cross-term
Cross-func
10%
65%
25%
15%
85%
1%
74%
25%
15%
85%
2%
73%
25%
15%
85%
Fitness parameters
T
Wf
Wd
Tolerance (pixels)
100
50
1000
2
100
50
1000
2
100
50
3000
2
4.
We used three dierent databases in the experiments. Example images and key characteristics are given in Figure 10. The
databases were selected to provide detection problems of increasing diculty. Database 1 (easy) was generated to give
well-defined objects against a uniform background. The pixels of the objects were generated using a Gaussian generator with dierent means and variances for each class. There
are three classes of small objects of interest in this database:
black circles (class1), grey squares (class2), and white circles
(class3). The Australian coin images (database 2) were intended to be somewhat harder and were taken with a CCD
camera over a number of days with relatively similar illumination. In these images, the background varies slightly in different areas of the image and between images, and the objects
to be detected are more complex, but still regular. There are
4 object classes of interest: the head side of 5-cent coins (class
head005), the head side of 20-cent coins (class head020), the
tail side of 5-cent coins (class tail005), and the tail side of 20cent coins (class tail020). All the objects in each class have
a similar size. They are located at arbitrary positions and
with some rotations. The retina images (database 3) were
taken by a professional photographer with special apparatus at a clinic and contain very irregular objects on a very
851
Number of images: 10
Object classes: 3
Image size 700 700
Number of images: 20
Object classes: 4
Image size 640 680
Number of images: 15
Object classes: 2
Image size 1024 1024
Terminal sets
Function sets
TermSet1 (rectilinear)
FuncSet1
TermSet2 (circular)
FuncSet1
II
TermSet3 (pixels)
FuncSet1
III
TermSet1 (rectilinear)
FuncSet2
EXPERIMENTAL RESULTS
In these experiments, 4 out of 10 images in the easy image database are used for training and 6 for testing. For the
coin images, 10 out of 20 are used for training and 10 for
testing. For the retina images, 10 are used for training and
5 for testing. The total number of objects is 300 for the easy
image database, 400 for the Australian coin images, and 328
for the retina images. The results presented in this section
were achieved by applying the evolved genetic programs to
the images in the test sets.
5.1.
Experiment I
This group constitutes the major part of the investigation. The main goal here is to investigate whether this
GP approach can be applied to multiple-class object detection problems of increasing diculty. The parameters used
in these experiments are shown in Table 3 (Section 3.6.4).
The average performance of the best 10 genetic programs
(evolved from 10 runs) for the easy and the coin databases,
and the average performance of the best 5 genetic programs
(out of 5 runs, due to the high computational cost) for the
retina images are presented.
The results are compared with those obtained using an
NN approach for object detection on the same databases
852
[12, 39]. The NN method used was the same as the GP
method shown in Section 1.1, except that the evolutionary
process was replaced by a network training process in step
(3) and the generated genetic program was replaced by a
trained network. In this group of experiments, the networks
also used the same set of pixel statistics as TermSet1 (rectilinear) as inputs. Considerable eort was expended in determining the best network architectures and training parameters. The results presented here are the best results achieved
by the NNs and we believe that the comparison with the GP
approach is a fair one.
5.1.1 Easy images
Table 5 shows the best results of the GP approach with the
two dierent terminal sets (GP1 with TermSet1, GP2 with
TermSet2) and the NN method for the easy images. For class1
(black circles) and class3 (grey circles), all the three methods
achieved a 100% DR with no false alarms. For class2 (grey
squares), the two GP methods also achieved 100% DR with
zero false alarms. However, the NN method had an FAR of
91.2% at a DR of 100%.
5.1.2 Coin images
Experiments with coin images gave similar results to the easy
images. These are shown in Table 6. Detecting the heads and
tails of 5 cents (class head005, tail005) appears to be relatively
straight forward. All the three methods achieved a 100% DR
without any false alarms. Detecting heads and tails of 20cent coins (class head020, tail020) is more dicult. While the
NN method resulted in many false alarms, the two GP methods had much better results. In particular, the GP1 method
achieved the ideal results, that is, all the objects of interest
were correctly detected without any false alarms for all the 4
object classes.
5.1.3 Retina images
The results for the retina images are summarised in Table 7.
Compared with the results for the other image databases,
these results are not satisfactory.3 However, the FAR is greatly
improved over the NN method.
The results over the three databases show similar patterns: the GP-based method always gave a lower FAR than
the NN approach for the same detection rate. While GP2 also
gave the ideal results for the easy images, it produced a higher
FAR on both the coin and the retina images than the GP1
method. This suggests that the local rectilinear features are
more eective for these detection problems than the circular
features.
5.1.4 Training times
We performed these experiments on a 4-processor ULTRASPARC4. The training times for the three databases are very
3 With the current techniques applied in this area, detecting objects in
images with a highly cluttered background is an extremely dicult problem
[5, 16]. In fact, these results are quite competitive to other methods for very
dicult detection problems. As a young discipline, it is quite promising for
GP to achieve such results.
class1
NN
GP1
GP2
Object classes
class2
class3
100
100
100
0
0
0
91.2
0
0
0
0
0
Object classes
head005 tail005 head020 tail020
100
100
100
100
NN
False alarm rate (%) GP1
GP2
0
0
0
0
0
0
182
0
38.4
37.5
0
26.7
Retina images
Best detection rate (%)
73.91
100
NN
GP1
GP2
2859
1357
1857
10104
588
732
dierent due to various degrees of diculty of the detection problems. The average training times used in the GP
evolutionary process (GP1) for the easy, the coin, and the
retina images are 2 minutes, 36 hours, and 93 hours, respectively.4 This is much longer than the NN method, which took
2 minutes, 35 minutes, and 2 hours on average. However,
the GP method gave much better detection results on all the
three databases. This suggests that the GP method is particularly applicable to tasks where accuracy is the most important factor and training time is seen as relatively unimportant.
4 Even if the training time for dicult problems is very long, the time
spent on applying the learned genetic program to the test set is usually very
short, say, from several seconds to about one minute.
853
Coin images
Retina images
100
100
100
100
100
100
100
73.91
100
1214
463
5.2. Experiment II
Instead of using rectilinear and circular features (pixel statistics) as in experiment I, experiment II directly uses the pixel
values as terminals (the third terminal set). For the input
field sizes of 14 14, 24 24, and 16 16, for the easy, the
coin, and the retina images, the number of terminals are 49
(7 7), 144 (12 12), and 64 (8 8), respectively. For the easy
images, the learning took about 70 hours on a 4-processor
ULTRA-SPARC4 machine to reach perfect detection performance on the training set and 78 generations were taken. The
population size used was 1000, the maximum depth of the
program was 30, the maximum initial depth 10, the maximum number of generations 100. For the coin images and
the retina images, the situation was worse. Since a large number of terminals were used, the maximum depth of the program trees was increased to 50 for the coin images and 60
for the retina images. The population size for both databases
used was 3000 with a maximum number of generations of
100. The evolutionary process took three weeks to complete
50 generations for the coin images and five weeks to complete
50 generations for the retina images. The best detection results were overall 22% FAR at a 100% DR for the coin images,
and about 850% FAR at a DR of 100% for microaneurisms
in the retina images.
While these results are worse than those obtained by the
GP1 and GP2 using the rectilinear and circular features, they
are still better than the NN approach. If we use a larger population (e.g., 10000 or 50000), a larger program size (e.g., 100),
and a larger number of generations (e.g., 300), the results
could be better according to our experience. While this is not
possible to investigate with the current hardware we use, it
shows a promising future direction with the improvement
and development of more powerful hardware, for example,
parallel or genetic hardware.
5.3. Experiment III
Instead of using the four standard arithmetic functions,
this experiment focused on using the extended function
set (FuncSet2), as shown in Section 3.3.2. The parameters
shown in Table 3 (Section 3.6.4) were used in this experiment. The best detection results for the three databases are
shown in Table 8.
As can be seen from Table 8, this function set also gave
ideal results for the easy and the coin images and a better
result for the retina images. The best DR for detecting micro
is 100% with a corresponding FAR of 463%. The best DR
for haem is still 73.91% but the FAR is reduced to 1214%. In
DISCUSSION
Analysis of results on the retina images
854
Program 1
Program 2
Program 3
F3 F14 F15
F5
F3
F19 F7 F10 F17 F16 F18 +
F14 +
F6 F14
F5
F5
F5
F19
F3 F6
F
+ 6
F5
F5 F11
F15
F15
F18 (F5 + F18 ) (F7 + F4 ) + F10 F19
F18
F4 F16
F3 + F5 F3 F5
F4 F16
(F16 + F7 ) F15 F4
F9
F13 F5
F
9 F4
F19
+ F11
F18
F10
Figure 12: Three sample generated programs for simple object detection in the easy images.
Coin images
In addition to the program shown in Figure 6, we present another generated program in Figure 14, which also performed
perfectly for the coin images.
Compared with those for the easy images, these programs
are more complex, which reflects the greater diculty of the
detection problem in the coin images. One dierence is that
these programs also contain constants. The set of possible
programs is considerably expanded by allowing constants as
well as the terminals, but the search for good values for the
855
+
F2
F19
87.251
F1
F5
F9
F17 F2 + F12 F12 F11
F11
F15 + F8
F16
F16
F9
F
F15 15
F8
F13 F15 87.251
+
+
+F10 F12 F9
F19
F17 F2
F1
F5
Figure 14: A sample generated program for regular object detection in the coin images.
constants is dicult. Our current GP is biased so that constants are only introduced rarely, but it is clear that the detection problem on the coin images is suciently dicult to
require some of these constants.
6.2.3 Retina images
One evolved genetic program for the retina images is presented in Figure 15. (The program is presented in LISP format rather than standard format because of its complexity.)
This program is much more complex than any of the programs for the easy and the coin images. The program uses
all 20 terminals and 8 constants. It does not seem possible
to make any meaningful interpretation of this program. It
may be that with high-level, domain-specific features and
domain-specific functions, it would be possible for the GP
system to construct simpler and more interpretable programs; however, this would be against one of the goals of
this paper which is to investigate domain-independent approaches.
Even the best programs for the retina images gave quite a
high number of false alarms, and it appears that the 20 terminals and 4 standard arithmetic functions are not sucient
for constructing programs for such dicult detection problems. Nonetheless, the program above still had much better
performance than an NN with the same input features.
6.3. Analysis of classification strategy
As described in Figure 8, we used a program classification
map as the classification strategy. In this map, a constant
T was used to give fixed-size ranges for determining the
classes of those objects from the output of the program. The
parameter can be regarded as a threshold or a class boundary
parameter. Using just a single value for T forces most of the
classes to have an equal possible range in the program output, which might lead to a relatively long time of evolution.
A natural question to raise is whether we can replace the single parameter T with a set of parameters, say, T1 , T2 , . . . , Tm ,
one for each class of interest.
To answer this question, we ran a set of experiments
on the easy images with three parameters, T1 , T2 , and, T3 ,
for the thresholds in the program classification map. The
experiments showed that some sets of values of the parameters resulted in an ideal performance but other sets of values
did not. Also, the learning/evolutionary process converged
very fast with some sets of values but very slowly with others. However, the results of the experiments gave no guidelines for selecting a good set of values for these parameters.
In some cases, using separate parameters for each threshold
may lead to a better performance than using a single parameter, but appropriate values for the parameters need to be
empirically determined. In practice, this is dicult because
there is no a priori knowledge in most cases for setting these
parameters.
We also tried an alternative classification strategy, which
we called multiple binary map, to classify multiple classes of
objects. In this method, we convert a multiple-class classification problem to a set of binary classification problems. Given
a problem L with m classes L = {c1 , c2 , . . . , cm }, the problem is decomposed into L1 = {c1 , other}, L2 = {c2 , other}, . . .,
Lm = {cm , other}, where ci denotes the ith class of interest and
other refers to the class of nonobjects of interest. In this way, a
multiple-class object detection problem is decomposed into
a set of one-class object detection tasks, and GP is applied to
each of the subsets to obtain the detection result for a particular class of interest. We tested this method on the detection
problems in the three image databases and the results were
similar to those of the original experiments.
One disadvantage of this method is that several genetic
programs have to be evolved. On the other hand, the genetic programs may be simpler, which may reduce the training time for each program. In fact, for the coin images problem, a considerably shorter total training time was required
to create a set of one-class programs than to create a single
multiple-class program. A more detailed discussion of this
method is outside the goal of this paper, and is left to future
work.
6.4.
856
Figure 15: A sample generated program for very dicult detection problems in the retina images.
Analysis of reproduction
In early GP, the reproduction rule did a probabilistic selection of genetic programs from the current population based
857
300
250
200
150
Best fitness
200
100
50
100
0
0%
10%
15%
20%
25%
30%
40%
CONCLUSIONS
20
40
60
Generations
80
100
Figure 17: Training easy images based on the old and the new reproduction rules.
of one-class problems. Also, most current research uses different algorithms in multiple independent stages to solve the
localisation problem and the classification problem; in contrast, this paper uses a single learned genetic program for
both object classification and object localisation.
The experiments showed that mutation does play an important role in the three multiple-class object detection tasks.
This is in contrast to Kozas early claim that GP does not need
mutation. For GP applied to multiple-class object detection
problems, the experiments suggest that a 15%30% mutation rate would be a good choice.
The experiments also identified some limitations of the
particular approach taken in the paper. The first limitation concerns the choice of input features and the function set. For the simple and medium-diculty object detection problems, the 20 regional/rectilinear features and 4
standard arithmetic functions performed very well; however,
they were not adequate for the most dicult object detection task. In particular, they were not adequate for detecting
classes of objects with a range of sizes. Further work will be
required to discover more eective domain-independent features and function sets, especially ones that provide some size
invariance.
A second limitation is the high training time required.
One aspect of this training time is the experimentation required to find good values of the various parameters for each
dierent problem. The GP method appears to be applicable
to multiple-class object detection tasks where accuracy is the
most important factor and training time is seen as relatively
unimportant, as is the case in most industrial applications.
Further experimentation may reveal more eective ways of
determining parameters which will reduce the training times.
Subject to these limitations, the paper has demonstrated that GP can be used eectively for the multiple-class
858
detection problem and provides more evidence that GP has
a great potential for application to a variety of dicult problems in the real world.
ACKNOWLEDGMENTS
We would like to thank Dr. James Thom at RMIT University
and Dr. Zhi-Qiang Liu at the University of Melbourne for a
number of useful discussions. Thanks also to Peter Wilson
whose basic GP package was used in this project and to Chris
Kamusinski who provided and labelled the retina images.
REFERENCES
[1] P. D. Gader, J. R. Miramonti, Y. Won, and P. Coeld, Segmentation free shared weight networks for automatic vehicle detection, Neural Networks, vol. 8, no. 9, pp. 14571473,
1995.
[2] A. M. Waxman, M. C. Seibert, A. Gove, et al., Neural processing of targets in visible, multispectral IR and SAR imagery,
Neural Networks, vol. 8, no. 7-8, pp. 10291051, 1995.
[3] Y. Won, P. D. Gader, and P. C. Coeld, Morphological
shared-weight networks with applications to automatic target recognition, IEEE Transactions on Neural Networks, vol.
8, no. 5, pp. 11951203, 1997.
[4] H. L. Roitblat, W. W. L. Au, P. E. Nachtigall, R. Shizumura,
and G. Moons, Sonar recognition of targets embedded in
sediment, Neural Networks, vol. 8, no. 7-8, pp. 12631273,
1995.
[5] M. W. Roth, Survey of neural network technology for automatic target recognition, IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 2843, 1990.
[6] D. P. Casasent and L. M. Neiberg, Classifier and shiftinvariant automatic target recognition neural networks, Neural Networks, vol. 8, no. 7-8, pp. 11171129, 1995.
[7] S. K. Rogers, J. M. Colombi, C. E. Martin, et al., Neural networks for automatic target recognition, Neural Networks, vol.
8, no. 7-8, pp. 11531184, 1995.
[8] J. R. Sherrah, R. E. Bogner, and A. Bouzerdoum, The evolutionary pre-processor: automatic feature extraction for supervised classification using genetic programming, in Proc.
2nd Annual Conference on Genetic Programming (GP-97), J. R.
Koza, K. Deb, M. Dorigo, et al., Eds., pp. 304312, Morgan
Kaufmann, Stanford, Calif, USA, July 1997.
[9] W. A. Tackett, Genetic programming for feature discovery
and image discrimination, in Proc. 5th International Conference on Genetic Algorithms, ICGA-93, S. Forrest, Ed., pp. 303
309, Morgan Kaufmann, Urbana-Champaign, Ill, USA, July
1993.
[10] J. F. Winkeler and B. S. Manjunath, Genetic programming
for object detection, in Proc. 2nd Annual Conference on Genetic Programming (GP-97), J. R. Koza, K. Deb, M. Dorigo,
et al., Eds., pp. 330335, Morgan Kaufmann, Stanford, Calif,
USA, July 1997.
[11] A. Teller and M. Veloso, A controlled experiment: evolution
for learning dicult image classification, in Proc. 7th Portuguese Conference On Artificial Intelligence, C. Pinto-Ferreira
and N. J. Mamede, Eds., vol. 990 of Lecture Notes in Computer
Science, pp. 165176, Springer-Verlag, Funchal, Madeira Island, Portugal, October 1995.
[12] M. Zhang and V. Ciesielski, Centred weight initialization
in neural networks for object detection, in Computer Science 99: Proc. 22nd Australasian Computer Science Conference,
J. Edwards, Ed., pp. 3950, Springer-Verlag, Auckland, New
Zealand, January 1999.
859
Mengjie Zhang received a B.E. (mechanical engineering) and an M.E. (computer
applications) in 1989 and 1992 from the
Department of Mechanical and Electrical Engineering, Agricultural University of
Hebei, China, and a Ph.D. in computer
science from RMIT University, Melbourne,
Australia, in 2000. During 19921995, he
worked at the Artificial Intelligence Research Centre, Agricultural University of
Hebei, China. In 2000, he moved to Victoria University of Wellington, New Zealand. His research is focused on data mining, machine
learning, and computer vision, particularly genetic programming,
neural networks, and object detection. He is also interested in web
information extraction, and knowledge-based systems.
Victor B. Ciesielski received his B.S. and
M.S. degrees in 1972 and 1975, respectively,
from the University of Melbourne, Australia
and his Ph.D. degree in 1980 from Rutgers University, USA. He is currently Associate Professor at the School of Computer Science and Information Technology,
RMIT University, where he heads the Evolutionary Computation and Machine Learning Group. Dr. Ciesielskis research interests
include evolutionary computation, computer vision, data mining,
machine learning for robot soccer, and, in particular, genetic programming approaches to object detection and classification.
Peter Andreae received a B.E. (honours) in
electrical engineering from the University
of Canterbury, New Zealand, in 1977 and
a Ph.D. in artificial intelligence from MIT
in 1985. Since 1985, he has been teaching
computer science at Victoria University of
Wellington, New Zealand. His research interests are centered in the area of making
agents that can learn behaviour from experience, but he has also worked on a wide
range of topics, ranging from reconstructing vasculature from xrays, clustering algorithms, analysis of micro-array data, programming by demonstration, and software reuse.