Anda di halaman 1dari 72

Road Sign Recognition from a

Moving Vehicle
Bjorn Johansson
1
Abstract
This project aims to research the current technology for recognising
road signs in real-time from a moving vehicle. The most promis-
ing technology for intelligent vehicle systems is vision sensors and
image processing, so this is examined the most thoroughly. Dier-
ent processing algorithms and research around the world concerned
with sign recognition are investigated. A functioning system has
also been implemented using a standard web-camera mounted in a
testing vehicle. This system is restricted to speed signs and achieves
good performances thanks to fast but still robust algorithms. Colour
information is used for the segmentation and a model matching al-
gorithm is responsible for the recognition. The human-computer
interface is a voice saying what sign has been found.
2
Contents
1 Introduction 5
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Sensing Hardware 12
2.1 Active versus Passive Sensors . . . . . . . . . . . . . . . . . . . . 12
2.2 Vision based intelligent vehicles . . . . . . . . . . . . . . . . . . . 13
2.3 Advantages and Disadvantages of Computer Vision Approach . . 15
3 Processing 17
3.1 Processing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Architectural Issues . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Previous work 20
4.1 Using knowledge about road signs . . . . . . . . . . . . . . . . . 20
4.1.1 Colour basics . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.2 The HSI colour space . . . . . . . . . . . . . . . . . . . . 22
4.2 Object detection and recognition using colour . . . . . . . . . . . 24
4.3 Detection using Shape . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Algorithm considerations . . . . . . . . . . . . . . . . . . . . . . 26
5 Techniques 29
5.1 Interesting Techniques . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Multi-feature Hierarchical Template Matching Using Distance
Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Shape recognition using convex hull approximation . . . . . . . . 34
5.5 Temporal Integration . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.6 Colour classication with Neural Networks . . . . . . . . . . . . . 35
5.7 Recognition in Noisy Images Using Simulated Annealing . . . . . 36
5.8 Using a Statistical Classier . . . . . . . . . . . . . . . . . . . . . 39
5.8.1 Statistical Classier with cascade classication . . . . . . 41
5.8.2 How to get features from an image . . . . . . . . . . . . . 42
5.9 Present state of the art of the Road Sign Recognition Research . 45
6 My Implementation 47
6.1 Resulting system . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3
6.3 Inuences caused by motion . . . . . . . . . . . . . . . . . . . . . 49
6.4 Practical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.5 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5.1 The Camera . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5.2 The Computer . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 The Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6.1 Internal Segmentation of the Sign . . . . . . . . . . . . . 60
6.6.2 OCR for the Digits . . . . . . . . . . . . . . . . . . . . . . 60
6.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.7 Trial and error, Experiences learned during the process . . . . . . 64
6.7.1 Color Segmentation . . . . . . . . . . . . . . . . . . . . . 64
6.7.2 Filtering/Feature extraction . . . . . . . . . . . . . . . . . 68
6.7.3 Using an Edge Image . . . . . . . . . . . . . . . . . . . . 73
6.7.4 Compensating for a Rotated/Tilted sign . . . . . . . . . . 77
6.7.5 Discriminators between digits . . . . . . . . . . . . . . . . 79
7 The Future of RSR/ITS 82
4
Chapter 1
Introduction
1.1 Background
The Road Sign Recognition (RSR) is a eld of applied computer vision research
concerned with the automatical detection and classication of trac signs in
trac scene images acquired from a moving car. The result of the RSR research
eort can be used as a support system for the driver. When the neighbourhood
environment is understood, computer support can assist the driver in advanced
collision prediction and avoidance.
Driving is a task based almost entirely on visual information processing. The
road signs and trac signals dene a visual language interpreted by drivers.
Road signs carry many information necessary for successful driving - they de-
scribe the current trac situation, dene right-of-way, prohibit or permit certain
directions, warn about risky factors etc. Road signs also help drivers with nav-
igation.
Two basic applications of RSR are under consideration in the research com-
munity - drivers aid (DSS) and automated surveillance of road trac devices.
It is desirable to design smart car control systems in such a way which will allow
the evolution of fully autonomous vehicles in the future. The RSR system is
also being considered as a valuable complement of the GPS-based navigation
system. The dynamical environmental map may be enriched by road sign types
and positions (acquired by RSR) and so will increase the precision in the vehicle
positioning.
Problems concerning trac mobility, safety and energy consumption have
become more serious in most developed countries. The endeavours to solve
these problems have triggered the interest toward new elds of research and
applications such as automatic vehicle driving, in which new techniques are
investigated for the entire or partial automation of driving tasks. A recently
dened comprehensive and integrated system approach referred to as intelligent
transportation system (ITS), links the vehicle, the infrastructure, and the driver
to make it possible to achieve more mobile and safer trac conditions by using
state of the art electronic communication and computer controlled technology.
Over time the ITS research community expects that intelligent vehicles will
advance in three primary ways: in the capabilities of in-vehicle systems, in the
sophistication of the driver-vehicle interface and in the ability of vehicles to
5
communicate with each other and a smart infrastructure [1].
An example of this can be found in the research by DaimlerChrysler [50]. A
vehicle detects ice on the road and noties this to a radio transmitting station
which broadcasts this information to all other vehicles approaches this area.
The vehicle can also transmit the information to vehicles moving behind and
approaching this area. The rst car may have to break and it will warn ap-
proaching vehicles of this intention so that it is not hit from behind. See gure
1.1.
Figure 1.1: Cooperation between the intelligent infrastructure and intelligent
vehicles to warn about slippery road ahead.
Smart vehicles will be able to give route directions, sense objects, warn
drivers of impending collisions (with obstacles, other cars and even pedestri-
ans), automatically signal for help in emergencies, keep drivers alert, and may
ultimately be able to take over driving.
ITS technologies may provide vehicles with dierent types and levels of in-
telligence to complement the driver. Information systems expand the drivers
knowledge of routes and locations. Warning systems, such as collision avoidance
technologies, enhance the drivers ability to sense the surrounding environment,
help the driver in sorting and understand all the information passed to him via
road signs and other types of road markings.
In the last two decades government institutions have activated initial explo-
rative phases by means of various projects world-wide, involving a large number
of research units who worked co-operatively, producing several prototypes and
solutions, based on rather dierent approaches.
In Europe, the PROMOTHEUS project (PROgraM for a European Traf-
c with highest Eciently and Unprecedented Safety) started this exploration
stage in 1986. The project involved more than 13 vehicle manufacturers and sev-
eral research units from governments and universities of 19 European countries.
Within this framework a number of dierent ITS approaches were conceived,
implemented and demonstrated.
In the United States a great deal of initiatives were launched to address
the mobility problem involving universities, research centres and automobile
companies. After this pilot phase, in 1995 the US government established the
6
Figure 1.2: A typical city image in a non cluttered scene.
National Automated Highway System Consortium (NAHSC) [3] and launched
the Intelligent Vehicle Initiative (IVI) right after in 1997.
In Japan, where mobility problem is even more intense and evident some ve-
hicle prototypes were also developed within the framework of dierent projects.
Similarly to the US case, in 1996 the Advanced Cruise-Assist Highway Sys-
tem Research Association (AHSRA) was established amongst a large number
of automobile industries and research centres [4] which developed dierent ap-
proaches to the problem of automatic vehicle guidance.
The ITS is now entering its second phase characterised by a maturity in ap-
proaches and by new technological possibilities which allow the development of
the rst experimental products. A number of prototypes of intelligent vehicles
have been designed, implemented and tested on the road. The design of these
prototypes has been preceded by the analysis of solutions deriving from similar
and close elds of research and has produced a great ourishing of new ideas, in-
novative approaches and novel ad hoc solutions. Robotics, articial intelligence,
computer science, computer architectures, telecommunications, control and au-
tomation and signal processing are just some of the principal research areas from
which the main ideas and solutions were rst derived. Initially underlying tech-
nological devises, such as head up displays, infrared cameras, radar and sonar
derived from expensive military applications but, thanks to the increased inter-
est in theses applications and to the progress in industrial production, todays
technology oers sensors, processing systems and output devices at very com-
petitive prices. In order to test a wide spectrum of dierent approaches these
automatic vehicles prototypes are equipped with a large number of dierent
sensor and computing engines
1.2 The problem
The problem to detect road sign might seem well dened and simple. The road
signs occur in standardised positions in trac scenes, their shapes, colours and
pictograms are known (because of international standards).
To see the problem in its whole complexness we must add additional fea-
7
tures that inuence the recognition system design and performance: Road signs
are acquired from a vehicle moving on the (often uneven) road surface with
considerable speed. The trac scene images then often suer from vibrations;
colour information is aected by varying illumination. Road signs are frequently
occluded partially by other vehicles or other objects like trees or lamp poles,
or other signs. Many objects are present in the trac scenes which make the
sign detection hard (pedestrians, other vehicles, buildings and billboards may
confuse the detection system by patterns similar to that of road signs). Fur-
thermore, the algorithms must be suitable for real-time implementation. The
hardware platform must be able to process huge amount of information in the
video data stream.
Also there exists variations in the actual pictograms on the signs. For ex-
ample an examination of the sign A12 Children reveals that there is a wide
dierence between the norm and the signs that are used, see gure 1.3. The
most common dierences are in ideograms and widths of the sign border. Some
symmetrical signs like No Stopping and End Of Right Of Way are often
inverted. The reason is, that these alternations do not cause problems to human
drivers and hence has been neglected. For automatic road sign recognition this
means that the norms can not be taken as a fundamental basis and variations
have to be dealt with by the program.
Figure 1.3: Dierences between European road signs (sign A12 Children).
An interesting approach would be to integrate a road sign recognition sys-
tem with a Geographical Information System (GIS) which could backup the
recognition system by stored information about the road on which the vehicle
is currently driving on. This can increase the safety of the recognition system.
A recognition of a 110 sign inside a town will be classied as a false recognition.
A test with an ISA (Swedish abbreviation for Intelligent Support system for
Adjustment of speed) which integrates a GPS system with a database of speed
limits has been implemented in a public transport buss. The system makes it
dicult to accelerate above the legal speed limit, the accelerator resists the
attempt by a counter force. As reported in Metro [49], the driver nds the
system helpful and stress reducing. He says that he is more aware of the speed
limits and is not as tempted as before to increase the speed a little if he falls
behind the schedule. The safety must come in the rst hand, the time schedule
in the second.
8
The same thing might apply to all drivers, including those driving personal
vehicles. If a DSS system regulates the speed it is not as tempting to accelerate
to speeds over the regulated limit if another car comes up to close behind. The
concentration can be taken from the speedometer and to the current trac
situation.
Any on-board system for ITS applications needs to meet some important
requirements.
The nal system, installed on a commercial vehicle, must be suciently
robust to adapt to dierent conditions and changes of environment, road,
trac, illumination, and weather. Moreover, the hardware system needs
to be resistant to mechanical and thermal stress.
On-board systems for ITS applications are safety-critical and require a
high degree of reliability: the project has to be thorough and rigorous
during all its phases, from requirements specication to design and im-
plementation. An extensive phase of testing and validation is therefor of
paramount importance.
For marketing reasons, the design of an ITS system is driven by strict cost
criteria (it should cost no more than 10 % of the vehicles price), thus
requiring a specic engineering phase. Operative costs (such as power
consumption) need to be kept low as well, since vehicle performance should
not be aected by the use of ITS apparatus.
The systems hardware and sensors have to be kept compact in size and
should not disturb car styling.
The design of the driver-vehicle interface (the place where the driver inter-
acts physically and cognitively with the vehicle) is critical. When giving
drivers access to ITS systems inside the vehicle, designers must not only
consider safety (i.e. overloading the drivers information-processing re-
sources), but also usability and driver acceptance [5]: interfaces will need
to be intelligent and user friendly, eective, and transparent to use; in par-
ticular, a full understanding of the subtle tradeos of multimodal interface
integration will require signicant research [2].
9
Chapter 2
Sensing Hardware
2.1 Active versus Passive Sensors
Laser-based sensors and millimetre-wave radar detect the distance of objects
by measuring the travel time of a signal emitted by the sensors themselves and
reected by the object, and are therefore classied as active sensors. Their main
common drawback consists of low spatial resolution and slow scanning speed.
However millimetre-wave radar are more robust to rain and fog than laser based
radar, though more expensive.
Vision based sensors are dened as passive sensors and have an intrinsic
advantage over laser and radar sensors: the possibility of acquiring data in a
non-invasive way, thus not altering the environment (image scanning is per-
formed fast enough for ITS applications). Moreover they can be used for some
specic applications for which visual information plays a basic role (such as
lane markings localisation, trac signs recognition, and obstacle identication)
without requiring any modications to road infrastructures. Unfortunately, vi-
sion sensors are less robust than millimetre-wave radar in foggy, night or direct
sunlight conditions.
Active sensors posses some specic peculiarities which result in advantages
over vision-based sensors, in this specic application: they can measure some
quantities such as movement, in a more direct way than vision and require less
performing computing resources, as they acquire a considerably lower amount
of data. Nevertheless, besides the problem of environment pollution, the wide
variation in reection ratios caused by dierent reasons (such as obstacles shape
or material) and the need for the maximum signal level to comply with some
safety rules, the main problem in using active sensors is represented by interfer-
ence among sensors of the same type, which could be critical for a large number
of vehicles moving simultaneously in the same environment, as, example, in the
case of autonomous vehicles travelling on intelligent highways. Hence, foreseeing
a massive and widespread use of autonomous sensing agents, the use of passive
sensors, such as cameras, obtains key advantages over the use of active ones.
Obviously, machine vision does not extend sensing capabilities besides hu-
man possibilities in very critical conditions (e.g., in foggy weather or at night
with no specic illumination), but can, however, due to a lack of concentration
or due to drowsiness.
10
2.2 Vision based intelligent vehicles
Some important issues must be carefully considered in the design of a vision sys-
tem for automotive applications. In the rst place, ITS systems require faster
processing than other applications, since vehicle speed is bounded by the pro-
cessing rate. The main problem that has to be faced when real-time imaging is
concerned and which is intrinsic to the processing of images is the large amount
of data - and therefor computing - involved. As a result, specic computer archi-
tectures and processing techniques must be devised in order to achieve real-time
performance. Nevertheless, since the success of ITS apparatus is tightly related
to their cost, the computing engines cannot be based on expensive processors.
Therefore, either of -the-shelf components or ad hoc dedicated low cost solutions
must be considered.
Secondly in the automotive eld, no assumptions can be made on key param-
eters, for example, scene illumination or contrast, which are directly measured
by the vision sensor. Hence, the subsequent processing must be robust enough
to adapt to dierent environmental conditions( such as sun, rain or fog) and
to their dynamic changes (such as transitions between sun and shadow, or the
entrance or exit from a tunnel).
Furthermore, other key issues, such as the robustness to vehicles movements
and drifts in the cameras calibration, must be handled as well. However recent
advances in both computer and sensor technologies promote the use of machine
vision also in the intelligent vehicles eld. The developments in computational
hardware, such as a higher degree of integration and a reduction of the power
supply voltage, permit the production of machines that can deliver a high com-
puting power with fast networking facilities, at an aordable price. Current
technology allows the use of SIMD-like processing paradigms even in generation
of processors that include multimedia extensions.
In addition current cameras include new important features that permit the
solution of some basic problems directly at sensor level. For example, image sta-
bilisation can be performed during acquisition, while the extension for camera
dynamics allows one to avoid the processing required to adapt the acquisition
parameters to specic light conditions, at least to some extent. The resolution
of sensors has been drastically enhanced, and, in order to decrease the acqui-
sition and transfer time, new technological solutions can be found in CMOS
sensors, such as the possibility of dealing with pixels independently as in tradi-
tional memories. Another key advantage of CMOS-based sensors is that their
integration on the processing chip seems to be straightforward.
Many dierent parameters must be evaluated for the design and choice of an
image acquisition device. First of all some parameters tightly coupled with the
algorithms regard the choice of monocular versus binocular (stereo) vision and
the sensors angle of view ( some system adopt a multi-camera approach by using
more than one camera with dierent viewing angles, e.g. sh eye or zoom). The
resolution and the depth (number of bit/pixel) of the images have to be selected
as well (this also includes the selection of colour versus monochrome images)
Other parameters-intrinsic to the sensor-must be considered. Although the
frame rate is generally xed for CCD-devices (25 or 30 Hz) the dynamics of the
sensor is of basic importance: conventional cameras allow an intensity contrast
of 500:1 within the same image frame, while most ITS applications require a 10
000:1 dynamic range for each frame and 100 000: 1 for a short image sequence.
11
Dierent approaches have been studied to meet this requirement, ranging from
the use of CMOS-based cameras with a logarithmically compressed dynamic [6],
[7] to the interpolation and superimposition regarding values of two subsequent
images taken from the same camera [8].
In conclusion, although extremely complex and highly demanding, computer
vision is a powerful means for sensing the environment and has been widely em-
ployed to deal with a large number of tasks in the automotive eld, thanks
to the great deal of information it can deliver (it has been estimated that hu-
mans perceive visually about 90 % of the environment information required for
driving).
2.3 Advantages and Disadvantages of Computer
Vision Approach
Employing computer vision technology into smart vehicles design calls for con-
sideration of all its advantages and disadvantages. Firstly, vision subsystem
incorporated into the DSS may exploit all the information processed by human
drivers without any requirements for new trac infrastructure devices (a hard
and expensive task). Smart cars equipped with vision based systems will be able
to adapt themselves to operate in dierent countries (with often quite dissimilar
trac devices). As the integration of various technologies in the eld of trac
engineering has been introduced the convenience of computer vision usage has
become more obvious. We may observe this trend e.g. in proceedings of annual
IEEE International Conference on Intelligent Vehicles (IVS) where more than
50 percent of papers are focused on Image Processing and Computer Vision
methods.
Obviously, there exists even disadvantages of the vision-based approach.
Smart vehicles will operate in real trac conditions on the road. So, the al-
gorithms must be robust enough to give good results even under adverse il-
lumination and weather conditions. For example Fridtjof Stein, main project
manager of the Cleopatra project (Clusters of embedded parallel time- critical
applications) [37] said that Reliable optical detection is the biggest hurdle the
project must overcome.
It is impossible to assure absolute system reliability and the system will not
be fail-safe. The aim is to provide a level of safety similar to or higher than
that of human drivers. Experiments have shown that 60 percent of crashes
at intersections and about 30 percent of head-on collisions could have been
avoided if the driver had an additional half-second to react. About 75 percent
of vehicular crashes are caused by inattentive drivers.
12
Chapter 3
Processing
3.1 Processing Strategies
Since sign recognition is generally based on the localisation of specic patterns,
it can be performed with the analysis of a single still image. In addition, some
assumptions may help and/or speed up the detection process. Due to both
physical and continuity constraints, the processing of the whole image can be
replaced by the analysis of specic regions of interest only (the so-called focus of
attention), in which the features of interest are more likely to be found. This is
a generally followed strategy that can be adopted using the results of previously
processed frames or assuming an a priori knowledge on the road environment.
In some approaches, in particular, windows of interest (WOIs) are deter-
mined dynamically by means of statistical methods. For example the system
developed by LASMEA [9] selects the proper window according to the current
state and previously detected WOIs. The search for features is an iterative pro-
cess where continuous updates of the lane model and the size of areas of interest
allow the lane detection task to be relatively noise insensitive.
Other systems adopt a more generic model for the road. The ROMA vision-
based system uses a contour-based method [11]. A dynamic road model permits
the processing of small portions of the acquired image therefor enabling real-time
performance. Actually, only straight or small curved roads without intersections
are included in this model. Images are processed using a gradient-based lter
and a programmable threshold. The road model is used to follow contours
formed by pixels that feature a signicant gradient direction value.
3.2 Architectural Issues
In the early years of ITS applications, a great deal of custom solutions where
proposed, based on ad hoc, special-purpose hardware. This recurrent choice
was motivated by the fact that the hardware available on the market at a rea-
sonably low cost was not powerful enough to provide real-time image processing
capabilities. As an example, the researches of the Universitt of Bundeswehr
developed their own system architecture: several special-purpose boards were
included in the Transputer-based architecture of the VITA vehicle [12]. Others
developed or acquired ad hoc processing engines based on SIMD computational
13
paradigms to exploit the spatial parallelism of images. Among them, the cases
of the 16k Mas-Par MP-2 installed on the experimental vehicle NavLab I [13] at
Carneige Mellon University and the massively parallel architecture PAPRICA
[16] jointly developed by the University of Parma and the Politecnico di Torino
and tested on the MOB-LAB vehicle.
Besides selecting the proper sensors and developing specic algorithms, a
large percentage of this rst research stage was therefore dedicated to the de-
sign, implementation and test of new hardware platforms. In fact, when a new
computer architecture is built, not only do the hardware and architectural as-
pects - such as instruction set, I/O interconnections, or computational paradigm
- need to be considered, but software issues as well. Low-level basic libraries
must be developed an tested along with specic tools for code generation, opti-
misation and debugging.
In the last few years, the technological evolution led to a change: almost all
research groups are shifting toward the use of o-the-shelf components for their
systems. In fact, commercial hardware has nowadays reached a low price / per-
formance ratio. As an example, both the new NavLab5 vehicle from Carneige
Mellon and the ARGO vehicle from the University of Parma are presently driven
by systems based on general-purpose processors. Thanks to the current avail-
ability of fast internetworking facilities, even some MIMD solutions are being
explored, composed of a rather small number of powerful, independent proces-
sors, as in the case of the VaMoRs-P vehicle of the Universitt der Bundeswehr
on which the Transputer processing system has now been partly replaced by
a cluster of three PCs (dual Pentium II) connected via a fast Ethernet-based
network [10]
Current trends, however, are moving toward a mixed architecture, in which a
powerful general-purpose processor is aided by specic hardware such as boards
and chips implementing optical ow computation, pattern-matching, convolu-
tion and morphological lters. Moreover, some SIMD capabilities are now being
transferred into the instruction set of the last generation CPUs, which has been
tailored to exploit the parallelism intrinsic to the processing of visual and audio
(multimedia) data. The MMX extensions of the Intel Pentium processor, for
instance, are exploited by the GOLD system which acts as the automatic driver
of the ARGO vehicle to boost up performance.
In conclusion it is important to emphasise that, although the new generation
of systems are all based on commercial hardware, the development of custom
hardware has not lost signicance, but is gaining a renewed interest for the
production of embedded systems. Once a hardware and software prototype has
been built and extensively tested, its functionalites have to be integrated in a
fully optimised and engineered embedded system before marketing. It is in this
stage of the project that the development of ad hoc custom hardware still plays
a fundamental role and its costs are justied through a large scale market.
14
Chapter 4
Previous work
4.1 Using knowledge about road signs
Knowledge is available that can be exploited for tackling the problem of road
sign detection and recognition in an ecient way. All road signs are designed,
manufactured and installed according to tight regulations stated by federal coun-
cils:
colour is regulated not only for the sign category (red = stops, yellow =
danger, etc) but also for the tint of the paint that covers the sign, which
should correspond, with a tolerance, to a specic wavelength in the visible
spectrum. This certainly is a key information but one would be careful in
using it since the standard has been determined according to the controlled
illumination that prevailed during the experiments while, in practice, the
weather conditions will have a denite impact on the outdoor illumination
and as a result, on the colours as perceived by the cameras. The paint on
signs also deteriorates with time.
Sign shape and dimensions along with those for pictograms, including text
font and character height are also regulated
Signs are usually located on the right side of the road, at a distance usually
ranging from 2 to 4.5 m from the road edge, which is loosely regulated,
with the exception of overhead or clearance signs which appear over the
middle lanes. This fact is useful for sign detection since a large portion
of the road image can be ignored and thus the processing can be speeded
up.
Signs may appear in various conditions including damaged, partly oc-
cluded, and highlighted by sun light. Signs may also be clustered, e.g.,
three or four signs appears one over/beside the other.
4.1.1 Colour basics
Colour undoubtedly represents a key information for drivers to handle. As a
consequence, almost all trac sign recognition systems found in the literature
process colour and acknowledge its importance. Before describing any solution
15
for sign detection based on colour, it is advisable to understand what colour is
and how it is dealt with in the computer vision community, particularly with
regard to changes in lighting conditions, which is a major factor for road sign
recognition which will operate in every imaginable outdoor lightning conditions.
When we see a red object, we tend to believe that the red colour is an in-
trinsic property of the object. This is not strictly the case. Such a fundamental
property would actually be the reectance of the surface of the object, which al-
ters the incident light rays so that a portion of the light energy is absorbed while
the rest is reected to our eyes (or the camera). The spectral distribution of
the reected rays conveys the chromatic information, and one will acknowledge
that it is dependent not only on the object surface (the interface, the nature of
the matter that makes the object and the pigments of the colorant) but also on
the spectral distribution of the incident light. The following equation exposes
the basic idea:

=
_
visible
E()S()R

()d (4.1)
where

is a measurement by sensor , E()S() is the colour signal perceived


by the sensor and R

() is the spectral sensitivity of the sensor. The colour


signal is the result of light of spectral power distribution E() hitting the surface
of an object with spectral reectance S().
This implies that, for example, if a trac sign is lit with sunlight that is
characterised by a rich spectral distribution with much energy toward the blue
wavelengths then its colour, as measured by a sensor, will be dierent from
that of the same sign lit by a cars headlamps, which have an asymmetrical
distribution and high energy in the red portion of the spectrum.
Moreover, the colour that human eyes perceive is actually a sensation that
is a function of many parameters of biological nature and the environment in
which the object appears. This is the reason why a sub domain of the science
of colour is concerned with the psychophysics of the phenomenon.
Similarly to the human eye which has three kinds of receptors (cones) for
sensing specic parts of the spectrum, modern CCD cameras perceive colour
with three sensors, each for a primary colour: Red, green and blue. So an object
seen by a camera is represented by a collection of three coordinate (R,G,B)
pixels. The data space containing the pixels is called the colour space. Apart
from the RGB colour space, there are many ways to represent colour depending
on the application. For example, the YIQ colour scheme, which is based on linear
transformations of the RGB coordinates, is used for broadcasting. In this colour
scheme, Y stands for luminance, I and Q coordinate carry the chrominance
information.
Directly using a threshold value in the RGB space is generally not applicable
since variations in the ambient light intensity will shift the same colour toward
the white (r,g,b) = (255, 255, 255) or (for low energy light) to the black corner
(r,g,b) = (0,0,0).
4.1.2 The HSI colour space
An interesting space is called HSI (Hue, Saturation and Intensity) and it has the
distinctive feature of being similar to the way colours are perceived by humans.
The rst coordinate, Hue (H), represents the actual colour or tint information.
16
Saturation (S) indicates how deep or pure the colour is, e.g., red is deeper than
pink. Intensity (I) is simply the amount of light. RGB coordinates can be
mapped to HSI space with the use of non-linear transformations (from Digital
Image Processing, Gonzales et al. [48]):
H =
_
if B G
360 if B > G
(4.2)
with
= cos
1
_
1
2
[(R G) + (R B)]
[(R G)
2
+ (R B)(GB)]
1/2
_
(4.3)
The saturation component is given by
S = 1
3
(R +G+B)
[min(R, G, B)] (4.4)
Finally, the intensity component is given by
I =
1
3
(R +G+B) (4.5)
Even though variations exists to these transformations the HSI space is al-
ways viewed as a conical shaped space with the position of a point expressed in
terms of cylindrical coordinates, i.e., the triplet (H, S, I) corresponds to cylin-
drical coordinates (, r, z) The HSI colour space is certainly appealing for colour
processing because chromatic information is represented by the hue coordinate,
and varying light conditions are absorbed (to some extent) by the intensity
coordinate. However, there are diculties in using this space:
There is a singularity in the hue dimension along the grey-level axis (R =
G = B).
The Hue coordinate is unstable near the same axis, i.e., small perturba-
tions in the RGB signals may cause strong variations in hue.
Properties
Hue is multiplicative/scale invariant: hue(R, G, B) = hue(aR, aG, aB) for
all a such that (aR, aG, aB) [0, 255] [0, 255] [0, 255]
Hue is additative/shift invariant: hue(R, G, B) = hue(R+b, G+b, B+b)
for all b such that(R +b, G+b, B +b) [0, 255] [0, 255] [0, 255]
The second point underlines that hue is invariant under saturation changes,
which means that the tint of an object can still be recovered even if the object
is lit with an intensity-varying illumination source. In fact, Perez and Koach
[17] show that the hue coordinate is unaected by the presence of highlights and
shadows on the object, as long as the illumination is white (i.e. equal energy in
the red, green and blue colour bands).
17
4.2 Object detection and recognition using colour
Colour-based object detection and recognition has a denite appeal to researchers
and developers in machine vision. With colour, it is possible to eciently per-
form tasks such as object sorting, quality control, even counting, or structure
measuring in medical imaging, etc. Various methods have been published about
colour object detection.
One approach is to use neural networks that are trained to recognise patterns
of colours. For example, Krumbiegel et al [18] used a neural net as classier for
recognising trac signs within a region of interest. Another idea is by comparing
colour histograms. Dubuisson and Jain [19] exploited this idea in the design of a
car matching system for measurements of travel times. A rst camera grabs an
image of a passing car and the computer saves its histogram to ta database. A
second camera is set up further down the road to get the image of another car,
hopefully the same car that was imaged by the rst camera. The computer uses
the colour histogram for indexing the image into the database and the potential
candidates are then analysed for shape similarity.
One could imagine a database of road signs along with their histograms, and
a sign detection module what would analyse various portions of the image in
order to give an estimate of the probability of nding a sign at each location.
The drawback of this is the sensitivity to lighting changes as it deals with RGB
pixels.
Projects using colour information for image segmentation:
Ghica et al [20] use look-up tables on the RGB colour space to lter out
unwanted colors.
Kehtarnavaz [21] exclusively process stop signs. The colors of a road
scene are mapped to the HSI clour space, in which sub-spaces have been
dened according to a statistical study on stop signs (i.e. 3

hue 56

,
saturations > 15 units, intensity < 84 units, units are not specied by
the authors but it is probably in the [0 255] interval). A binary image
is then constructed with pixels falling into the sub-space.
An interesting work is that of Priese and Rehrmann [22] who proposed a
new parallel segmentation method based on region growing. The approach
is in fact hierarchical in the sense that subsets of pixels (in the HSI space)
are grouped at dierent levels so that every object can be represented
by a tree. The work has been incorporated into a complete trac sign
recognition system [23] for which colour classes have been set up according
to the types of signs to be recognised.
[24] et al also work in HSI space. They apply a region growing process to
get colour patches which in turn undergo a shape analysis in a later stage.
Krumbiegel et al [18] explore the connectionist approach for road sign
detection. An articial retina is made up of three multilayer neural
networks for the three RGB planes and a control unit. The nets act as
correlators by giving a high output if the region of interest covered by the
retina contains a sign. If no sign is present, the control unit forces the
retina to shift to another portion of the image.
18
Kellmeyer and Zwahlen [25] follow the same line in using the same type
of neural net for road scene segmentation. Inputs to the net are colour
dierences, dened as proportion of red - proportion of green, and 2 x
proportion of blue - proportion of yellow, between each pixel and its neigh-
bours. The eight output correspond to a palette of the eight colours, red,
orange, yellow, green, blue, violet brown and achromatic, which are the
most discriminating colours for trac signs. All pixels are processed by
the net. However, since only warning signs are sought, the yellow patches
are further processed for shape analysis.
4.3 Detection using Shape
The search for shape-based sign detection algorithms take advantage of the
tremendous eorts of research in the eld of object recognition, where techniques
have been developed for scene analysis in robotics, solid (3D) object recognition,
part localisation in CAD databases and so on. The intrinsic complexity of the
task to be performed leads one to think that a model-based approach is needed.
It would yield an elegant solution for object detection in cluttered scenes and
allow new shapes to be added to the set of shapes under tracking. There are a
number of diculties for shape based sign detection:
The signs appear in cluttered scenes, which increases the number of candi-
dates since many objects typical o f urban environments can be confounded
with read signs from a shape point of view: building windows, commer-
cial signs (e.g. real-estate signs), various man-made objects (e.g. cars,
mailboxes), etc.
The signs do not always have a perfect shape (corners may be torn, other
signs may be attached to them, etc.) and some are tilted with respect to
the vertical axis.
A signicant problem with sign detection is related to the variance in scale:
signs get bigger as a vehicle moves toward them. The detection module
should obviously handle these variations.
Not only do the signs have a varying size, they also appear relatively small:
40-50 pixels wide, at the most.
Another diculty is linked to the way the signs are captured by the acqui-
sition system. There may be an non-zero angle between the optical axis
of each camera and the normal vector to the sign surface. This angle may
be as high as 30 degrees, depending on the distance between the sign and
the camera.
4.4 Algorithm considerations
A nal algorithm should meet the following requirements:
The detection module should operate in real-time. One should thus look
for ecient, low-complexity matching algorithms.
19
The detection module should be exible and adaptable to dierent condi-
tions (such as speed of the vehicle, day-night conditions, exit of tunnels,
change in light etc).
The selection of an object recognition scheme for the detection of road signs
based on their shape will have to address a number of issues such as the type
of object representations, the ways to create a model for a new object and to
deal with uncertainties, e.g., imperfect models. Scale-invariant is a desirable
property of object representations. Past studies of shade detection in road sign
recognition
When looking for stop signs, Kehtarnavaz et al. [21] extract the red com-
ponents of the image, perform edge detection and then apply the Hough
transform to characterise the sides of the sign. A specic criteria is used
to conrm the presence of a stop sign.
The approach chosen by Saint-Blancard [26] is more rened. The acquired
image strongly reveals red components due to the presence of a red cut-o
lter in front of the camera lens. The image is ltered with an edge detec-
tion algorithm (viz. Dirrerential Nagao gradient) and the result is scanned
for contour follow-up using Freeman coding. The appropriate contours
are then analysed with respect to a set of features consisting of perime-
ter (number of pixels), outside surrounding box, surface (inside/outside
contour within surrounding box), centre of gravity, compactness (aspect
ratio of box), polygon approximation, Freeman code, histogram of Free-
man code, and average grey level inside box. Classication based on these
features is done by a neural network (restricted Coulomb Energy) or an
expert system. The author reports good results, especially in terms of
processing speed (contour detection in 0.6 sec. Using a DSP-based vision
board).
The work by Priese [23] is more model-based since basic shapes of traf-
c sign components (circles, triangles, etc.) are predened with 24-edge
polygons describing their convex hulls. At the detection stage, patches of
colours extracted by the segmentation algorithm are collected in object
lists, with one list for each sign colour (blue, white, red...); all objects
are then encoded, and assigned a probability (based on an edge-to-edge
comparison between the object and the model) to characterise their mem-
bership to specic shape classes. A decision tree is constructed for the
nal classication. As far as processing times go, the analysis (colour and
shape) of all objects in a 512x512 image takes about 700 ms on a Sparc10
machine. The concept is interesting but he shape modelling part is weak,
as is the decision making process.
A novel shape detection technique has been proposed by Besserer et al [27]
and it has been integrated into a trac recognition system. The method
classies chain coded objects according to evidence (in the Dempster-
Shafer theory) supplied by knowledge sources. These sources are a corner
detector, a circle detector and an histogram-based analyser which informs
on the number of main directions; the classes are the circle, the triangle
and the polygon. When an unknown shape is presented to the module,
20
each knowledge source studies the pattern and computes a basic proba-
bility assignment attached to the feature found. These probabilities are
combined with the Dempster rule, and a semantic network builds up the
belief for each class that the unknown shape is a member of that class.
The author claims that this method is exible and reliable, but point out
that the limiting factor is the segmentation quality, which can be improved
using colour information.
21
Chapter 5
Techniques
5.1 Interesting Techniques
Techniques referred to as rigid model tting in [28] may be promising. Many
of these use specic model representations and a common matching mechanism,
called geometric hashing, for indexing in a model database. The idea of ge-
ometric hashing is to build a hash table based on the chosen representation
(geometrical information) for each object: at the recognition stage, the hash
table is used to make a correspondence between a collection of features and
potential object model. Ex:
Lamdan and Wolfson [29] use a model description in terms of interest
points which are described in a transformation-invariant coordinate frame
and stored into a hash table along with the model itself. For recogni-
tion, the images is scanned for interest points, whose coordinates are then
transformed to be transformation invariant: votes are compiled for mod-
els that are pointed to through the hash table and those that collected
enough votes are more closely examine against the portions of the image
that triggered the votes. The principle can be generalised to collections
of edges or surfaces. The authors claim that this method is very ecient
even when objects are occluded (ane transformations are naturally han-
dled), but if the number of features is too high many false alarms might
arise.
Stein and Medioni [62] has presented an idea using super segments for
polygonal approximation of 2D shapes. A super-segment is made up of
a xed number of adjacent segments, and is encoded as a key to a hash
table (the coding is a collection of angles between pairs of segments plus
an eccentricity parameter that is similar to a compactness index). At the
recognition stage, the image is ltered with a Canny edge detector, bound-
ary tracing is performed and a line tting algorithm is used for polygonal
approximation. The resulting supersegmentes trigger hypotheses via the
hash table (in the form: super-segment i belongs to object model j) and
a consistency check allows the identication of the best matching model.
It is mentioned that the approach can recognise models in the presence
of noise, occlusion, scale, rotation, translation and weak perspective. An
22
example is given where the system manages to localise aircrafts from an
aerial image of an airport (very cluttered scene).
These algorithms might tend to be slow and indeed in the crude form they
are. However in road sign recognition more knowledge can be used to make the
implementation ecient and real-time. Colour information could be taken into
account as well as knowledge about position in the image and so on.
5.2 Sign Recognition
To satisfy the real-time demand most sign recognition techniques use fast recog-
nition techniques (e.g. neural networks) instead of more elaborate, more precise
but slower approaches.
A common approach is template matching. For example Piccioli et al [30]
use a template matching algorithm. All signs are stored in a database.
Each potential sign is normalised in size and compared to every template
of the same shape. Normalised cross-correlations are used.
Akatsuka and Imai [31] restricted the recognition to speed signs so their
content recognition algorithm is in fact a digit classier. An histogram
analysis helps to determine the position and size of each digit, then the
actual matching is done through correlation with digits from standard
speed signs.
Estable et al [32] relied on Radial Basis Function networks for pictogram
recognition. A collection of sign images with hand-dened regions of in-
terest enclosing the signs are used for training the neural nets. Some
colour processing is performed in order to enhance sign features such as
the coloured frame around signs, pictograms, etc.. After training, each net
has the task to detect a specic coloured frame or a specic pictogram.
The decision is taken followings inspection of the best responses supplied
by the RBF networks.
Structural strategies include those that decompose the pictogram into
its basic elements (circle, arrow, etc), asses their exactness with respect
to the model and combine various measures to yield a global similar-
ity/dissimilarity index. Classication using Fourier descriptors for each
component might also be used. Dynamic programming on separate el-
ements could also be possible considering that the search space is well
dened within the sign boundaries.
Global strategies do not try to decompose the pictogram into salient fea-
tures. Instead the symbol image is recorded into a compact representation
by means of a data compression technique such as a neural network, vector
quantization, or the Karhunen-Love (K-L) Transform, and this represen-
tation is used as an input to a standard classier.
23
5.3 Multi-feature Hierarchical Template Match-
ing Using Distance Transforms
From Gavrila, [14]. Since matching is a central problem in pattern recognition
an ecient implementation is crucial for computer vision algorithms basing the
recognition phase on these kind of approaches.
Figure 5.1: (a) Original trac scene, from [14]
Figure 5.2: (b) Template, from [14]
Figure 5.3: (c) Edge image, from [14]
This method uses distance transforms (DT) which transforms a binary image
consisting of feature and non feature pixels, e.g. from an edge image, into a DT
image where each pixel denotes the distance to the nearest feature point, see
gure 5.5. Similarly the object that is to be matched is described using the same
scheme. Matching proceeds by correlating the template against the DT image.
The correlation value is a measure of similarity in image space. Particular DT
algorithms depend on a variety of factors. The use of a Euclidean distance
metric or not is one of these factors. Another possibility is to use chamfer-2-3
metric. See for example Borgefors [33] for more information.
24
Figure 5.4: (d) DT image, from [14]
Figure 5.5: A binary pattern and its Euclidean Distance Transform, from [14]
Matching template T with image I consists of computing the distance trans-
form on I and transforms on the template T. The matching distance is deter-
mined by the pixel values of the DT of the image which lies under the on
pixel values of the template image. One possible measure of the similarity at
the template position is the chamfer distance:
D
chamfer
(T, I) =
1
|T|

tT
d
I
(t) (5.1)
where |T| denotes the number of features and d
I
(t) denotes the distance be-
tween feature t and the closest feature in I. A template is considered matched
if the distance measure D(T, I) is below a user-supplied threshold. A coarse-to-
ne approach can be used to speed up the correlation testing. The resolutions
of T and I can be reduced and matching tried with the smaller images. If a suf-
cient good match is found the matching can be continued at higher resolution
where there might be a match. Also if several templates are to be matched in
the same image they too can be grouped together and replaced by prototypes.
For circles this can be concretised by replacing all circles with ranges of circles
where the prototype for each range will be the circle with radii equal to the
median value of the interval. These can be grouped as a binary tree, each level
further reducing the intervals.
To reduce the number of false positives one can also divide the image into
several feature spaces. One possibility is to only consider edges in one direction
25
Figure 5.6: Matching using a DT, from [14]
at the time. The directions can be found by dividing the unit circle in M bins:
{[
i
M
2,
i + 1
M
2]|i = 0, . . . , M 1} (5.2)
Thus a template edge with edge orientation is assigned to the typed template
with index


2
M (5.3)
Error in the measurement of edge orientation must be considered so each edge
point is assigned to a range of templates within the tolerance of the error in
angle measurement. The distance measure will be the sum of the distance
measurements with the M templates and the M feature images at the examined
location.
Figure 5.7: Results using DT transforms, from [14]
26
5.4 Shape recognition using convex hull approx-
imation
The discrimination of the shape can be done by approximating the shape by a
polygon. The shape of an object can be encoded by its convex hull. A good
approximation of the convex hull can be a 24 sided regular polygon. All neigh-
bouring edges are at 15 degrees to each other. A circle will then be characterised
by 24 edges of nearly the same length. A square will have four sides of approx-
imately the same length at 90 degrees angles toward each other.
5.5 Temporal Integration
Simple tracking techniques involve few assumptions about the world and rely
solely on what is observed in the images to estimate object motion and es-
tablish correspondence. More sophisticated techniques model camera geometry
and vehicle speed to achieve better motion estimates. For example, [34] con-
siders a vehicle driving straight with constant velocity and uses a Kalman-lter
framework to track the centres of detected trac signs. Once correspondence is
established over time, integration of recognition results is done by simple aver-
aging techniques where larger weights are given to recognition results of trac
signs closer to the camera.
5.6 Colour classication with Neural Networks
An article by Rogahn [35] describes experiments using a neural network to
classify the pixels to the output classes sign pixel and non-sign pixel. The
use of a neural network for the colour classication have the benet that a better
solution than a human designer would design might be found. The design of
the network is as follows:
Input Layer 3x3x3 (x, y, colourspace)
Hidden Layer 1 6 nodes
Hidden Layer 2 3 nodes
Output Layer 3 (colourspace) with range 0 - 1 (0= Not a sign, 1 = Is a sign)
The training algorithm used is hybrid delta-bar-delta backpropagation with
skip. A dierent colour space is used for the input, similar to YUV and the
human visual system: YGrBy, Y = red + green, Gr = green - red, By = blue -
Y. A neural network for detecting edges had a similar design but with one out-
put node. The networks are trained using noise free and noisy images. It seems
however that changes in illumination in the image can reduce the classication
result. Especially for the colour classication.
27
5.7 Recognition in Noisy Images Using Simu-
lated Annealing
Fast Object Recognition in Noisy Images Using Simulated Annealing described
in [36]
Simulated Annealing is a search technique for solving optimisation prob-
lems. Its name originates from the process of slowly cooling molecules to form
a perfect crystal. The cooling process and its analogous search algorithm is
an iterative process, controlled by a decreasing temperature parameter. If the
search problem involves dierent kinds of parameters the annealing algorithm
is analogous to the cooling of a mixture of liquids, each of which have dierent
freezing points.
A model image M(x, y) should be matched in the image I(x, y). From the
model M(x, y) templates T(x, y) can be generated by choosing parameters that
describe a transformation of M into T. The parameters used are a rotation pa-
rameter and two sampling parameters s
x
and s
y
that dene the number of
samples along the templates coordinate axes (i.e. the scale factor). Thus new
templates can be generated online from a given model image. The recognition
Figure 5.8: Templates generation from model template, [36]
problem is dened as follows. An object in the image I is dened to be recog-
nised if it correlates highly with a template image T of the hypothesised object.
This template image T is a transformed version of the model of the hypothesised
object. Model images can be stored in a library. A correlation coecient is de-
ned which measures how accurate a subpart of the image can be approximated
by template T. Since a model normally does not ll a squared region only the
non-zero pixels in the template is compared with the pixels in the image.
The dimension of the search space is determined by the number of possi-
bilities for position, size, shape and orientation of the object to be found. The
number of possibilities for the centroid of the object in the image is O(n
2
) for
an nxn image. Assuming that the width and height of the object can be ap-
proximated by sampling the model along two perpendicular axes the number of
possibilities to approximate the size and shape of the object is also O(n
2
). The
number of possible angels are very large but since the image is discrete it can be
assumed that the number of angles is O(n). Thus the size of the search space
is O(n
5
) for an nxn image. An exhaustive search would take too long.
Terminology from the radar and sonar literature is used to describe the
28
search space. The search space is called an ambiguity surface. A peak in the
surface means that the correlation coecient is high for a particular set of pa-
rameters. There may be several peaks in an ambiguity surface. If the template
and the object in the image match perfectly, the cross-correlation between tem-
plate and image results in a peak in the ambiguity surface which is the global
optimum. An iterative search such as steepest decent risk to get stuck in local
minima. Simulated Annealing is able to jump out of local minima and nd the
globally best correlation value.
Figure 5.9: Trac Scene and its ambiguity surface for all possible translations
using xed scaling and rotation parameters. Simulated Annealing is used to
nd the best correlation value (here the darkest pixel value), [36]
At each iteration of the algorithm new templates are generated online by
randomly perturbate the values for location, sampling and rotation from current
values. If the new correlation coecient r
j
increases over the previous coecient
r
j1
, the new parameters are accepted in the j th iteration (as in the gradient
method). Otherwise they are accepted if
e
((E
j
E
j1
)/T
j
)
> (5.4)
Where is randomly chosen to be in [0,1], T
j
is the temperature parameter
and E
j
= 1 r
j
is the cost function in the j th iteration. For a sucient
temperature this allows jumps out of local minima. Take T
j
= T
0
/j as the
cooling temperature for the j th update of the temperature where T0 is the
initial temperature. The criterion for stopping is chosen to be a limit to the
search length L.
29
An algorithm that randomly perturbs all parameters at the same time has
poor convergence properties. Therefor at a specic temperature the test for
the location, sampling and rotation angle are not combined. Good results are
obtained by using simulated annealing only for the location parameters and
a gradient descent with large enough random perturbations for the remaining
parameters.
False matches is when one template from the image gets a higher correlation
value with the simulated annealing process than the correct one. This can
happen if the information content of one template is not very high. For example
the yield sign can give high correlation values where there are a horizontal bar
with dark regions above and below. The information content can be determined
using the coherence area (see Betke [36] for details). The yield sign has a large
coherence area (197) meaning that even if the sign is moved a bit from a perfect
match it would still give a high correlation value. Lots of similar objects will
give high correlation values for this model. The stop sign has a coherence area
of only 56 meaning that only similar objects and good transformation values
gives high correlation values.
The method produces good results even on noisy images. The authors ad-
vocate the use of template matching for recognition tasks. The templates can
be constructed on-line so the method is well suited for recognition tasks that in-
volve objects with scale and shape variations. The method described is for grey
scale images and an extension to colour images could be interesting to pursue.
However it might be too slow for real time applications. On a 112 x 77 pixel
image their implementation found the sign in 15 seconds after 300 iterations.
This is still a drastic improvement over the exhaustive search which took more
than 10 hours.
5.8 Using a Statistical Classier
A pattern recognition problem can often be modeled by a statistical decision
problem, a theoretical approach where the bayesian paradigm can be applied.
One wishes to classify a feature vector x R
D
to one of C mutually exclusive
classes knowing that the class of x, denoted , takes values in = {
1
, . . . ,
C
}
with probabilites P(
1
), . . . , P(
C
), respectively and that x is a realisation of a
random vector X characterized by a class conditional probability density func-
tion f(x|), . The task is to nd a mapping d: R
D
such that
the expected loss function R(d) = E{(d(X), )} called risk, is minimal. Here
(
i
,
j
) is the loss or penalty incurred when the decision d(x) =
i
is made
and the true pattern class is in fact
j
, j = 1, 2, . . . , C.
It can be assumed, without loss of generality, that (
i
,
j
) = 0 for
i
=
j
and (
i
,
j
) = 1 for
i
=
j
and then R(d) = P(d(x) = ) is called the
probability of error. The optimal rule d
opt
(called Bayes rule) minimizes R(d)
is of the following form:
d
opt
= arg max
1iC
P(|x) (5.5)
where
P(|x) =
P(
i
)f(x|
i

C
i=1
P(
i
)f(x|
i
)
i = 1, . . . , C (5.6)
30
are the posterior probabilities. R
opt
denotes then Bayes risk (the risk in
equation 5.5).
In practice we rarely have any information about the distribution of (x, ),
instead that we have a set of samples
N
= {(x
i
,
i
)}
N
i=1
i.e. a sequence of pairs
(x
i
,
i
) distributed like (x, ), where x
i
is the feature vector and
i
is its class
assignment. The set
N
of samples is called the training set.
An empirical classication rule d
N
is a function of X and
N
. It is natural to
construct the rule by replacing P(
i
|x) in 5.5 by some of its estimates

P(
i
|x).
Such a rule can be dened as
d
N
= arg max
1iC

P(|x) (5.7)
5.8.1 Statistical Classier with cascade classication
It has been demonstrated that road signs cannot be eciently recognised with
a monolithic classier. This is due rstly to that the number of distinct classes
would exceed a feasible limit. Kotek ([38] pp.178) recommends the class-count
to a maximum of 10 and the highest feature count to 20. Moreover the higher
the number of classes the longer will take the classiers decision since decisions
has to be taken for all classes one by one.
The solution is a cascade classier [39] where the recognition problem is
divided into several small recognition tasks using specic a priori knowledge.
Subproblems are then covered by tiny classiers. The classication process has
the form of a n-airy tree with classiers at each node and a verication process
at the leafs. The classes to be considered at the leafs has been greatly reduced
compared to the number of initial classes which gives a higher classication
speed and more failsafe results. The misclassication risk between separate
groups of road signs will also be minimised. If the algorithm rejects a sample
before the leaf classier the partial results still contains important information
about road sign group (contrary to the monolithical classier which oers only
a all-or-nothing approach). Finally, the particularities within each group may
be highlighted using the most descriptive features.
In a statistical approach to pattern recognition, Bayes rule 5.6 is used in the
classiers design. The a priori probability P(
i
) may be replaced by samples
frequencies but the probability density f(x|
i
) must be estimated from training
data set. For this there are two possibilities: If the shape of the proability density
is known but not the parameters, the classier is a parametric classier. If the
density shape is unknown, the classier is nonparametric. Since the distribution
type for road signs is unknown a nonparametric classier must be used. The
probability density and its parameters are learned from the training data. Hence
classication quality depends mainly in selection of characteristic train samples
and parameter estimation. An example of a nonparametric approach is the
kernel classier, see [40] for details.
5.8.2 How to get features from an image
There exist no general theory about passing from raw image data to quality
features yet. Successfully implemented methods from the literature may be
used. For road signs the entities carrying information is colour combinations,
shape and ideogram. Features selected in this classier is:
31
Unscaled Spatial Moment M
U
M
U
(m, n) =
J

j=1
K

k=1
(x
k
)
m
(y
j
)
n
F(j, k) (5.8)
Coordinates x
k
and y
j
are dened as :
x
k
= k
1
2
(5.9)
y
j
= J +
1
2
j (5.10)
Where the coordinate transformation is described by Pratt [41]
Scaled Spatial Moment M
M(m, n) =
M
U
(m, n)
J
n
K
m
=
1
J
n
K
m
J

j=1
K

k=1
(x
k
)
m
(y
j
)
n
F(j, k) (5.11)
Unscaled Spatial Central Moment U
U
Central moments are invariant against translation by dening the shape
centroid ( x, y). Moment M(0, 0) denotes the sum of pixel points in the
image.
U(m, n) =
1
J
n
K
m
J

j=1
K

k=1
[(x
k
x
k
)]
m
[(y
j
) y
j
]
n
F(j, k) (5.12)
where x
k
and y
j
are dened as:
x
k
=
M(1, 0)
M(0, 0)
(5.13)
y
j
=
M(0, 1)
M(0, 0)
(5.14)
Scaled Spatial Central Moment U
U(m, n) =
1
J
n
K
m
J

j=1
K

k=1
[(x
k
x
k
)]
m
[(y
j
) y
j
]
n
F(j, k) (5.15)
Centroid coordinates x
k
and y
j
are dened using unscaled moments:
x
k
=
M
U
(1, 0)
M
U
(0, 0)
(5.16)
y
j
=
M
U
(0, 1)
M
U
(0, 0)
(5.17)
32
Normalized Unscaled Central Moment V
V (m, n) =
U
U
(m, n)
[M(0, 0)]

(5.18)
where
=
m+n
2
+ 1 (5.19)
The normalization of the unscaled central moments has been proposed by
Hu.
Hus Invarinats h
i
(four rst, more computationally complex invariants
exists also)
These absolute invariants (introduced by Hu) are invariant under transla-
tion and general linear transformation.
h
1
= V (2, 0) +V (0, 2) (5.20)
h
2
= [V (2, 0) V (0, 2)]
2
+ 4[V (1, 1)]
2
(5.21)
h
3
= [V (3, 0) 3V (1, 2)]
2
+ [V (0, 3) 3V (2, 1)]
2
(5.22)
h
4
= [V (3, 0) +V (1, 2)]
2
+ [V (0, 3) V (2, 1)]
2
(5.23)
Ane Moments Invariants I
These four ane moment invariants were introduced by Flusser [43]:
I
2,2
= (V (2, 0)V (0, 2) V (1, 1)
2
)/V (0, 0)
4
(5.24)
I
3,4
= (V (3, 0)
2
V (0, 3)
2
6V (3, 0)V (2, 1)V (1, 2)V (0, 3) + 4V (3, 0)V (1, 2)
3
+
+V (2, 1)
3
V (0, 3) 3V (2, 1)
2
V (1, 2)
2
)/V (0, 0)
1
0 (5.25)
I
3,2
= (V (2, 0)(V (2, 1)V (3, 0) V (1, 2)
2
) V (1, 1)(V (3, 0)V (0, 3) V (2, 1)V (1, 2)) +
+V (0, 2)(V (3, 0)V (1, 2) V (2, 1)
2
))/V (0, 0)
7
(5.26)
I
4,2
= (V (4, 0)V (0, 4) 4V (3, 1)V (1, 3) + 3V (2, 2)
2
)/V (0, 0)
6
(5.27)
Normalized Size
nsize =
M(0, 0)
J K
(5.28)
A simple feature but with large distinctive strength for some signs.
Center of gravity
Calculated as in 5.13 and 5.14.
Eigenvalues of U

1
=
1
2
[U(2, 0) +U(0, 2)] +
1
2
_
U(2, 0)
2
+U(0, 2)
2
2U(2, 0)U(0, 2) + 4(U(1, 1)
2
(5.29)

2
=
1
2
[U(2, 0) +U(0, 2)]
1
2
_
U(2, 0)
2
+U(0, 2)
2
2U(2, 0)U(0, 2) + 4(U(1, 1)
2
(5.30)

max
= max[
1
,
2
] (5.31)

min
= min[
1
,
2
] (5.32)
33
Eigenvalues Ratio R
A
R
A
=

min

max
(5.33)
Standard Deviation Moments m
These kind of moments have been introduced b Mertzios and Tsirikolis in
the article [42]. The 2D moment of the grade p, q is dened as
m
pq
=
1
LM
L

x=1
M

y=1
_
x
i
x

x
_
p
_
y
i
y

y
_
q
F(x, y) p = 1, 2, . . . q = 1, 2, . . .
(5.34)
where L and M denote the image dimensions. The basic idea is that
the moment is normalised with respect to the standard deviation. The
standard deviation is dened as:

x
=

_
1
LM
_
L

x=1
M

y=1
(x x)
2
F(x, y)
_
(5.35)
The average x stands for the centroid coordinate. Such dened mo-
ments are invariant under translation and magnication of the image,
but not under rotation. Authors have recommended usage of moments
m
30
, m
40
, m
50
, m
60
, m
70
, m
80
and their counterparts m
03
etc. The mo-
ments may be calculated on either binary or grey level images.
Compactness
Compactness is calculated as:
comp =
P
2

4A

(5.36)
where P

stands for object perimeter and A

denotes the objects area. For


circles compactness comes close to unity while elongated objects has values
comp (1.0, ). This alows easy separation of circles from other objects.
The perimeter can be found using standard mathematical morphological
operations.
5.9 Present state of the art of the Road Sign
Recognition Research
Up to now, many algorithms for the road sign detection and classication have
been introduced. Road signs are often used as convenient real-world objects
suitable for algorithm testing purposes. There may be found papers focusing on
the presentation of successful recognition of particular road signs by some special
algorithm in the literature. These papers are a valuable source of information
about dierent recognition approaches.
A lot of articles that test various algorithms on the detection problem can
be found in the reference list at the end of this document.
The use of optical correlators have also been reported by research groups:
34
Application of optical multiple-correlation to recognition of road signs:
the ability of multiple-correlation, Matsuoka, Taniguchi, Mokuno, Optical
Computing. Proceedings of the International Conference, 1995
On-board optical joint transform correlator for real-time road sign recog-
nition, Guibert, Keryer, Servel, Attia, Optical Engineering Vol 34 Iss 1,
1995
Scale-invariant optical correlators using ferroelectric liquid-crystal spatial
light modulators, Wilkinson, Petillot, Mears, Applied Optics Vol 34 Iss
11, 1995
Extensive research eorts have been founded by Daimler-Benz (now Daimler-
Chrysler) whose research groups have reported papers concerning colour seg-
mentation, parallel computation structures design and more. The detection
system is designed to use colour information for the sign detection. The classi-
cation stage is covered by various neural-network or nearest- neighbour classi-
ers. The presence of colour is substantial and the system is unable to operate
on images with weak of missing colour information. The most important advan-
tage of research groups supported by Daimler-Chrysler is the library of 60000
trac scene images used for system training and evaluation.
35
Chapter 6
My Implementation
6.1 Resulting system
The algorithm which has been implemented is quite fast, the average time for
detection and classiciation of signs is about 200 ms on a 1 Ghz computer. The
detection phase is based on color. This detection is quite rudimentary but it
has shown to be both fast and able to segment the image in day and night-time
images. It might still fail if there are too many sign-like colors in the image.
It will also fail if the sign is too small so that a proper classication is not
successful. This limit is aproximately when the sign is about 17 pixels wide.
The worst case processing time is when there are many red and yellow regions
in the image which will produce lots of segments that has to be analyzed. The
worst case processing time during the tests was about 400 ms.
Lots of methodes have been tried for the realization of this algorithm and I
have described the most important of these in the section trial and error.
36
Figure 6.1: Screenshot Fifty
37
6.2 Introduction
For a successful analysis of the sign it has to have a minimum width in number
of pixels. This will impose requirements on the acquisition of the image in terms
of camera resolution, distance to the sign from the camera location, directivity
of the camera and the factors imposed by the lens and zoom options of the
camera.
The standard road sign is 64 cm wide. The minimum number of pixels
over the width of the sign for a possible recognition is about 18. A probable
resolution of the camera system is 640 pixels over the image. Assuming the eld
of view of the camera is 55 degrees. Then at 5 meters from the car the sign
will have the width of 79 pixels and a classication should be possible. The
minimum pixel width of 18 pixels occurs at 21 meters from the car.
6.3 Inuences caused by motion
As the sign is getting closer to the car it constitutes a bigger space in the view
plane but it will also be slightly more rotated. The angle between the line of
sight from the car and the normal of the sign increases. This means that the
sign gets more and more distorted as the car approaches. Equivalently when the
sign is far away from the car the angle between the line of sight and the normal
is small so the sign is less distorted. This means that there is an equation
that should be maximised between the distance to the sign and the angle of
distortion. This should probably be done by experimentation.
Also the moving of the car can cause vibrations in the images. Also blurring
can occur because of the motion of the objects captured in the images. The
tests will reveal how severe these eects are and how they can be dealt with.
6.4 Practical tests
The tests should consist of evaluating the quality of images that can be retrieved.
For this purpose a camera which is at the top-of-the-line among the cameras
existing today should be used. There is no point in using a camera only suitable
for distances up to a couple of meters. The tests will evaluate if the resolution is
adequate for a possible detection. Dierent camera variables will also be tested.
Most cameras have setting or options for brightness and gamma corrections.
This can also be processed as a pre-process step to the detection algorithm.
Dierent settings for resolution can also be tried. A smaller resolution can
mean faster detection since the number of pixels to be scanned is smaller. Of
course this can lead to a more dicult detection since the number of pixels
covering the sign has been reduced. Focus settings will also be tried. The best
focal distance might be 5 meters away from the car since the sign is at its best
position there. Since we are receiving a ow of images we must try to choose
the best images to analyse. With todays computers it is not possible to do
an extensive analysis of all images (up to 30 per second) and we must try to
quickly scan through a few images per second and try to detect if there might
be a sign in this image. Lots of heuristics can be applied here to speed up the
search as much as possible. When a possible sign is detected another algorithm
38
can be applied to see what sign it is. The practical tests should produce lots of
materials that can be used for testing dierent heuristics.
These testings have shown that the eects of blurring caused by motion and
by vibrations of the car are not that severe. A successful detection when moving
at 70 or 90 km/h is possible without the use of additional motion compensation.
Otherwise there are optical stabilisation systems that can improve the image at
the camera level. Fast system exist now for handcameras and similar systems
can be used in RSR.
39
6.5 Hardware
6.5.1 The Camera
The camera used is a Logitech Quickcam Pro 3000 (gure 6.2) which has true
640x480 pixels resolution and a colour depth of 24 bits. It has a lens aperture
of F/2.0 and manual focus settings (you have to turn the ring on the camera
manually). This means that the focus setting must be set to a specic range
where the sign is expected to be located. This has proven to not be a big
problem. As long as the focus setting is set for a few meters the sign will appear
in enough clarity for a successful classication even if it is tens of meters away.
The focus settings are not one of the most critical factors of the system
Figure 6.2: The camera used for the experiments
The camera which have been chosen can be accessed using standard TWAIN
interfaces. Even if this seem to be a quite slow solution demonstrated by the
initial testing we can still evaluate the performance of the detection system
by analysing the time to analyse the images captured from the camera. A
reasonable demand for a system of this type would be that the sign should be
detected before the car passes it. Speed limit signs imposes a speed restriction
which has its eective start at the sign. Thus the system can warn the driver
if he is going too fast at that point. Preferably the warnings should come a
bit sooner to give the driver time to slow down. A crude detection algorithm
might nd possible signs at a long distance from the sign and before it is able to
correctly decode the sign it might warn the driver that a speed limit sign might
be approaching and thus give the driver adequate time to react.
6.5.2 The Computer
The computer used for the experiments is a portable 450 MHz with 128 Mb
of RAM and a 2x16kb internal (code+data) and 256 kb external cache. The
testing is done under windows 2000. This computer and the camera has been
mounted in a testing vehicle for image gathering and testing purposes.
40
6.6 The Program
Figure 6.3: Simplied class diagram showing the active classes
This represent almost 10 000 lines of code. The algoritm is largely like the
following:
The rst thing that is done with an image is to extract the regions containing
colours that are belonging to signs. The red regions are found by rst extracting
all colours in the image that has the following requirements:
The red component of the RGB value is at least 35 (of 255).
The fraction between blue and red is at most 0.69 :
blue
red
< 0.69, the
inuence of the blue is limited.
The fraction between green and red is at most 0.64:
green
red
< 0.64
41
Figure 6.4: Localizing signs
If too few pixels where found that met these requirements the lightning
conditions in the image might be so that lower limits might be required so the
values are changed to 20 for the red value and 0.76 for blue/red and 0.74 for
green/red. Even lower values might be needed for images when a very bright
sky is dominating the image and the iris is almost closed so that the colours in
the sign becomes very dark. Then the values are lowered even more. If then
still too few pixels meeting the requirements are found there might be very bad
lightning conditions but a segmentation is still tried after a dilation of the pixels
found to increase the probability that an eventual sign is found.
If too many pixels is found it is probable that the signs in the image has a
relative big size or that the colour classication was not successful in selecting
only the best matching pixels. Thus we can erode the binary image to increase
the probability of a correct segmentation in the following steps.
The binary image is segmented and overlapping regions is joined. If joined
regions represent two signs this will be detected in subsequent steps. If we did
not nd any region that is bigger than the smallest possible sign. The pixels
42
Figure 6.5: Classifying signs
found are dilated and resegmented.
Even though this is a rudimentary and heuristic method it has proven to be
fast and successful even in poor lightning conditions.
Figure 6.6: Red regions marked.
43
The classication of yellow pixels is done in the same fashion. Here the limits
for the minimum values applies both for red and green and the blue should not
be too big in comparison with the other values.
Figure 6.7: Yellow regions marked.
The two segmented images is merged together so that eventual signs can
be marked. Ideally we would like to nd a yellow region that is about 75
% of an overlapping red region. This is rarely the case since signs might be
oversegmentad and under segmented. The procedure used is as follows:
For all yellow regions do
For all red regions do
If the yellow region is within the red region
then add a region using the yellow region extended by a normal sign
border which is 16% of the height of the yellow region. Add also
the red region if it is approximately square shaped.
If some yellow regions just overlap the red region a new region is
formed by adding the red and yellow regions to form a new sign
region.
Yellow regions are also compared to one another to nd signs that have
been split somehow in the segmentation process and needs to be put
back together. Also red regions which lies close to yellow regions
are examined so that an undersegmented sign can still be found by
extending the region to include the probable sign area.
The expected number of possible segmentation regions is expected to be small
so the correctness of the segmentation is not veried at this stage, except for too
small or too big segments which are removed as well as those extending beyond
the borders of the image.
44
Figure 6.8: Example of a segmentation of a partly hidden sign
45
Figure 6.9: Example of a segmentation of a sign backgrounded by a red seiling
46
Figure 6.10: Example of a segmentation of a nighttime image
47
6.6.1 Internal Segmentation of the Sign
Next follows the internal segmentation, i.e. to nd the numbers that might be
there if the sign is a speed sign.
All the pixels in the internal region of the sign area is sorted and the darkest
30 % are kept. This is likely to include only the pixels belonging to digits
since the yellow background is lighter (higher values in the RGB space). The
remaining pixels are segmented and labelled into regions. Here we would ideally
like that each digit is one segment.
Possible digit regions are searched for within the segmented region, starting
at possible locations for speed sign digits. If no regions are found that are big
enough and on the right place for standard digits, small regions are joined until
the most likely digit region is found.
Each digit is magnied by a factor of 4 to facilitate for the following classi-
cation process. This is done because the analytical models of digits that has
been created will perform better on larger objects containing more pixels. It
can be said that the models have a sub-pixel resolution so the correlation will
produce better results if the digits found are enlarged.
Figure 6.11: Segmentation of the internal digits
6.6.2 OCR for the Digits
To correctly be able to classify the found digits a model matching method is
used. Mathematical models of every possible digit has been created. This has
the benet that they can be created in any wanted size without loss of precision.
Correlations for each digit is calculated using a model created with the same
size as the segment found in the sign. The correlation calculation is done using
a Yule distance dened as:
distanceY ule =
n
11
n
00
n
10
n
01
n
11
n
00
+n
10
n
01
(6.1)
where n
10
is the number of pixels where the model has a 1 but there is a 0 in
the image. This together with the Jaccard distance, d
J
= n
11
/(n
11
+n
10
+n
01
),
are found to be the best model matching distances by TRIER et al [44]. I have
48
Figure 6.12: The mathematical models of the digits matched in a 50 sign
Figure 6.13: The digits created (the digits that can be found in speed signs)
tried them both and found that the Yule distance gives the best correlations
values and the lowest values when there is a mismatch.
A voting principle is used to give the actual digit and the dierence to the
second highest vote is used as a condence value.
6.6.3 Results
Times to analyse and (correctly) classify the following images:
49
Image Time to analyze (s)
normal 50 radie 14.bmp 0.203
normal 70 radie 14.bmp 0.094
normal 30 radie 17.bmp 0.140
dusk 50 20 m radie 14.bmp 0.250
dusk 50 10 m radie 24.bmp 0.109
winter 50 radie 19.bmp 0.218
winter 70 close radie 19.bmp 0.156
winter 70 far radie 9.bmp 0.219
50 darker 7 m radie 13.bmp 0.216
30 full light radie 20.bmp 0.170
50 summer day radie 8.bmp 0.422
90 bright sky radie 12.bmp 0.219
90 bright sky radie 19.bmp 0.187
70 cloudy day radie 14.bmp 0.110
30 trainstation radie 9.bmp 0.094
30 trainstation radie 11.bmp 0.281
temporal 2 50 trainstation radie 8.bmp 0.234
temporal 3 50 trainstation radie 12.bmp 0.109
temporal 4 50 trainstation radie 16.bmp 0.109
temporal 5 50 trainstation radie 25.bmp 0.125
30 radie 12 full light.bmp 0.156
30 trainstation radie 12.bmp 0.109
dusk 70 30 m radie 9.bmp 0.125
These testings have been done o-line on a 1 GHz Intel processor.
Discussion and Analyzis
It can be seen that the normal time to analyze an image is about 200 ms which
is a time acceptable for real time performances. For a real time algorithm it
is important that the processing time can be bounded. The worst case time
can be reached with dicult images with background colors interfering with the
segmentation process and might take up to over 400 milliseconds. An absolute
bound is impossible to calculate since it depends on the number of potential sign
regions found. Also if a big portion of the pixels in the image are classied as
red or yellow pixels the lists of classied pixels will be long. The sorting of these
pixels are done by insertion into a sorted list and can be improved by using a
hashing approach. The strategy to avoid a too costly sorting is the three stage
approach. A possible improvement is to try to measure a representative part
of the image to nd probable good lightning settings which guarantees that the
list will be bounded.
Another possible improvement would be to use the same parameters for
the color classication as for the previous image since lightning conditions is
expected to be the same between two consecutive images. Then the color seg-
mentation will only have to be applied once, contrary to when the conditions
are not known and the process might have to be applied up to three times before
the good parameters are found.
The color classication might also be sped up by using a histogram based
approach. Red colors could be collected in a histogram and the brightest could
be extracted. The same thing could be done for the yellow colors.
50
Also regions of interest could also improve the processing time. If a sign is
found at one location in the image it is most likely that it is in about the same
area in subsequent images.
The algorithm fails for images where the sign is too small and where there
are too many red regions in the background, with similar colors to the sign paint.
Often images like that can still be segmented well thanks to the yellow regions
but if there also are lots of yellow regions this too can make the algorithm fail.
If the sign region is not found good enough the internal segmentation might also
fail. Often the sign can still be found by using the temporal sequence. As the
perspective between the camera and the sign changes as the car approaches it,
red regions in the background might not have as much inuence in later images.
6.7 Trial and error, Experiences learned during
the process
During the course of this work lots of dierent image treatment methods have
been implemented and tested. I will in this section share the experiences learned
by these experiments.
6.7.1 Color Segmentation
My rst implementation of the colour segmentation algorithm was to use a
spherical region in the RGB 3D colour space:
m
f
ThresholdRed < (RedCenterR)
2
+(GreenCenterG)
2
+(BlueCenterB)
2
(6.2)
I used a dialog with slides to adjust the centre of the sphere and its size. This
method can extract the colours well in any type of image but since dierences
in light conditions can change the values of the red sign colours, the sphere
has to have a large diameter to be able to include all variations. This makes it
impractical since a large sphere makes the segmentation to be unsuccessful. A
dynamically changing sphere depending on measured light conditions could be
considered but a light sky in the background will cause the sign to appear very
dark so an absolute measure will not be successful.
51
Figure 6.14: Red sign region found
Figure 6.15: The yellow region extracted using values RGB = [216 178 5] and
a distance of 9000.
52
Figure 6.16: A dark image such as this will have completely dierent settings
as showed by the following images where the regions have been extracted
Figure 6.17: Red regions extracted using centre RGB = [99 73 66] and a diameter
value of 450.
Figure 6.18: Yellow regions extracted using centre RGB = [124 109 76] and a
diameter value of 750.
53
I have also tried to use threshold algorithm in the HSI colour space. For
this I have used a multiple threshold approach. Since red colours can be found
both on high Hue values and on low ones, a red pixel must be below a threshold
HueLow or above a threshold HueHigh. Saturation and intensity must lie
between an upper and a lower limit. Example results can be seen from the
following images. Figure 6.19 shows the red regions found using HSI colour
space with
Hue
LOW
= 25

Saturation
MIN
= 0.09
Intensity
MIN
= 0.15
Hue
HIGH
= 313

Saturation
MAX
= 0.92
Intensity
MAX
= 0.87
Figure 6.20 shows a daylight image using the same settings.
Figure 6.19: Red regions found using HSI colour space.
Figure 6.20: Red regions in a daylight image using the same settings.
It can be seen that this produces a lot better result than the RGB approach
since the two images are almost segmentable using the same thresholds. This is
54
not possible with RGB where a dark image will be without red pixels using the
settings applied to a daytime image. The HSI thresholds are not as sensitive as
the RGB neither. A small change in the HUE value will not aect the result as
much as a small change in the RGB sphere settings could.
For the segmentation process I have also tried to use a slowly varying average
threshold to minimise the spill-over which can occur if the sign is in front of
something red. The red region might be expanded to the red background and
will therefor fail. Assuming that the colours of the sign are varying within a
small interval which hopefully does not include colours outside the sign. An
average colour is found using the previous pixels classied as red pixels. New
pixels are only accepted if they are close enough (within a small sphere centred
on the average colour) to this average . This method is not successful when a
part of the sign is in shadow or the sign is tilted so that it reects the light a
bit dierently on dierent parts of the sign.
6.7.2 Filtering/Feature extraction
From (Rehrmann [15])To remove noise in the image and to increase the clas-
sication precision a ltering approach can be used. A simple linear lter like
the averaging lter which replaces the centre pixels by a weighted average of
its neighbourhood pixels have the drawback of blurring the edges. Thus a non-
linear lter that adapts to local changes in the structure of the underlying image
signal.
Median-Filter.
The problem with the median-lter in colour images is the lack of a linear
order () on colour vectors. The median lter cannot be used individually on
each colour band since then completely new colours might arise. One possibility
to transfer median lter to colour images is searching for that pixel p, whose
sum of colour distances to all other pixels is minimal. The colour value of the
central pixel is then replaced by p. Choose k {0, . . . , N} such that
N

j=0
||c
k
c
j
|| = min
0iN
{
N

j=0
||c
i
cj||}
Where N is the size of the neighbourhood. Then f(c
0
)
new
= f(c
k
).
K-Nearest-Neighbor Filter.
In KNN ltering, all the colour values in the neighbourhood are compared
with the central pixel colour value. Those k pixels are determined having the
closest colour values. The central pixel is nally replaced by the equally weighted
average of the k nearest neighbours. Let v
1
= (r
1
, g
1
, b
1
), . . . , v
N
= (r
N
, g
N
, b
N
)
be the colour values of the neighbourhood of a central pixel c
0
with ||f(c
0
)
v
1
|| ||||f(c
0
) v
2
|| . . . ||f(c
0
) v
N
||. Then
f(c
0
)
new
=
1
k
(
k

i=1
r
i
,
k

i=1
g
i
,
k

i=1
b
i
)
55
Symmetric Nearest Neighbor
Figure 6.21: Hexagonal pixel arrangement, [15]
Figure 6.22: Hexagonal structure ontop of the orthogonal raster, [15]
A hexagonal rastering of pixels would have the pixels ordered as in gure
6.21. This hexagonal structure can be used as a logical structure on top of an
orthogonal raster. Figure 6.22 shows a possible partition of an image into a
hexagonal structure. Pitikainen and Harwood [45] suggested a SNN technique
for noise cleaning. Use the three centre symmetrical pairs (c
1
, c
6
), (c
2
, c
5
) and
(c
3
, c
4
) in the hexagonal neighbourhood surrounding pixel c
0
. From each pair
choose the pixel whose colour value is closer to the central pixels colour value.
The central pixel is then replaced by the equally weighted average of the three
determined neighbours. Let v
i
= (r
i
, g
i
, b
i
), 1 i 3, be the three colour
vectors with
v
1
= f(c
1
), if ||f(c
0
) f(c
1
)|| ||f(c
0
) f(c
6
)||
f(c
6
), otherwise
56
v
2
= f(c
2
), if ||f(c
0
) f(c
2
)|| ||f(c
0
) f(c
5
)||
f(c
5
), otherwise
v
3
= f(c
3
), if ||f(c
0
) f(c
3
)|| ||f(c
0
) f(c
4
)||
f(c
4
), otherwise
Compairision
The two crucial criteria for the choice of the right lter are its complexity and the
improvement of the segmentation results. The following table shows the num-
ber of colour distance calculations, comparisons, additions and multiplication
resp. divisions for a neighbourhood of 6 pixels. The fastest lter is obviously
the SNN-lter.
Filter Colour Distance Comparisons Additions Mult./Div.
Median 12 6 49 0
KNN 6 21 3k 3
SNN 6 3 9 3
These have been tested for segmentation of images and the tests have shown
that the KNN and SNN-lter lead to comparable results which were better than
that of the colour median lter. The results have also been conrmed by the
results of a concrete application (trac sign recognition) where the inuence
of the lters on the recognition rates has been analysed with more than 5000
images ([46]). This would mean that the choice in lter is clear. The SNN lter
is not only the fastest but also the one leading to the best segmentation results.
I have incorporated the SNN lter in the recognition program and found that
images can be segmented better and the classication is more secure. However
when the segment of interest (the sign) gets to small, the blurring that still
occurs reduces the classication result. The result would be better if a true
hexagonal layout could have been used. The time to lter a 320x240 image is
about 60 ms. For the nal application I have not included this lter since the
algorithm is not that noise sensitive. Also a ltered image will in some cases
have a longer recognition time.
Figure 6.23: Result of applying the SNN lter on a noicy image
57
Figure 6.24: Original image, ltered with SNN and the last ltered with an 3x3
averaging lter. The results are scaled about 800 times. The advantage of the
SNN over an averaging approach is clear.
58
6.7.3 Using an Edge Image
I have also performed some experiments on extracting the edges of the sign in
the image. I have modied a Laplace edge detector to be used directly in a
colour image. I have used a 3x3 mask with the following coecients:
-1 -1 -1
-1 8 -1
-1 -1 -1
This is applied to each colour band separately to produce three values
[red, blue, green]. These are multiplied with a scaling vector dened as [0.3, 0.11, 0.59].
If the resulting sum (0.3red +0.11blue +0.59green) is larger than 97 the centre
pixel is dened to be an edge pixel. This is a heuristic threshold which has given
good results.
An example of this applied to an image can be seen in gure 6.25.
Figure 6.25: Result of applying Laplace Edge detector
This image is still a bit noisy and it might be dicult and computationally
complex to nd signs among these edges. A better approach is to rst extract
the red colours and then apply the edge detector. An example of this can be
seen in image 6.26
59
Figure 6.26: The edges of the red regions edge ltered with a Laplace Edge
Detector.
Figure 6.27: Algorithm using edges for the detection [47]
60
A possible approach using this kind of information is implemented by Zadeh,
Kasvand and Suen [47]. The input image is thresholded keeping only sign
colours. To remove noise a mild erosion is performed to preserve small (dis-
tant) road signs. This is followed by a labelling process. Coloured regions and
adjacent coloured regions are potential road signs depending on the colour com-
bination found. The next phase is concentrated on the labelled regions where
a geometrical analysis is performed. It is known that the outer edge of a trac
sign is either round or it consists of a piece-wise sequence of linear sections.
The angels between the linear sections are known and, except for the rectan-
gle, the length rations between the sides (and their slopes) are also known. By
tracking the outer edge of a region and subsequent analysis of the edge camou-
aging parts can be removed. The outer edges of camouaging regions are often
ragged and can thus be found. It is possible to detect and separate circular
and linear regions. Similar logic can be applied to the outer edge of regions that
may be missing from the sign due to something blocking it.
If a signicant part of the edge is circular it is possible that a circular sign
has been detected. Likewise, if straight edges obey the rules of trac signs
(angles between and length ratios of straight sections), the region is likely to be
a sign of a given type. The colour information is also available and thus it is
possible to nd which group of signs it belongs to.
Further processing includes model-matching and colour analysis to recognise the
sign.
61
Figure 6.28: The use of edges for road sign recognition [47]
62
6.7.4 Compensating for a Rotated/Tilted sign
Since the model matching assumes that the model and the template has the
same rotation angle I implemented an algorithm for detecting if the sign is
rotated and then compensate for this. The calculations to nd this angle of
rotation is based on moments. The moment of order (p+q) for a discrete image
is dened as
m
pq
=

i=

j=
i
p
j
q
F(i, j) (6.3)
where i, j are the pixel co-ordinates in the image.
The co-ordinates (x
c
, y
c
) of the regions center of gravity is then found as
follows
x
c
=
m
10
m
00
y
c
=
m
01
m
00
m
00
is in fact the regions area. Using the center of gravity we can dene central
moments (which are translation invariant):

pq
=

i=

j=
(i x
c
)
p
(j y
c
)
q
F(i, j) (6.4)
For elongated regions a direction can be found and dened as the direction
of the longer side of a minimum bounding rectangle. When the central moments
of the object are knonw the direction can be found using this formula:
=
1
2
tan
1
(
2
11

20

02
) (6.5)
A minimum bounding box can then be found from the boundary points. By
dening
(x, y) = xcos +y sin (6.6)
(x, y) = xsin +y cos (6.7)
The minimum and maximum of and is search for testing all boundary
points (x, y). The values of
min
,
max
,
min
,
max
will dene the bounding
rectangle (shown in gure 6.29). Its length and width can be used as feature
values and are l
1
= (
max

min
) and l
2
= (
max

min
).
Rotation is performed by nding new positions for every black pixel within
the bounding rectangle using standard rotation formulas. This has not been
included in the nal version since as long as the internal segmentation is precise
the model matching will give good results even if the sign is slightly tilted.
Too tilted signs should not exist since the Swedish road authority will correct
faults like that. Still the algorithm is eective in correcting tilted digits, and
correctly rotated digits gives higher correspondance values in the subsequent
model correlation.
A problem is that the digit 7 is rotated too much by the algorithm since the
calculated direction will not be the correct direction according to our prefer-
ences. The solution is to use the calculted direction for the 0 which must be the
63
Figure 6.29: Rotation of the digits found and compensated for.
same, and rotate the digit with the found in that computation. An example
of this can be seen in image 6.30.
Figure 6.30: Original 7 which is tilted. 7 rotated with the direction found in the
7 segment. Rotation of the original 7 with the angle found in the 0 segment.
6.7.5 Discriminators between digits
I have implemented some algorithms that calculates discriminating factors for
the digits. Calculation of compactness have been tried. Compactness is a pop-
ular shape description characteristic independent of linear transformations and
is dened as:
compactness =
(region border length)
2
4m
00
(6.8)
The most compact region in a Euclidean space is then a circle. Compactness
assumes values in the interval [1, ).
Figure 6.31: Compactness: (a) compact; (b) non-compact
For the numbers occuring in speed signs this has quite large discriming
power. As expected the zero has the lowest compact value and 5 the highest
64
(almost three times that of the zero). The following values have been obtained
using digits of the same size (the divisor 4 has been left out):
Digit c
2
/A
0 22.1
1 32.4
3 54.4
5 58.9
7 45.4
9 30.4
This is calculated using the inner border. To be able to use this indepen-
dantly of linear transformation this should be changed to use the outer border.
The inner border is found by tracing the boundary using the following algorithm
(from Sonka [51] pp.142)
1. Search the image from top left until a pixel of a new region is found; this
pixel P
0
then has the minimum column vvalue of all pizels of that region
having the minimum row value. Pixel P
0
is a starting pixel of the region
border. Dena a variable dir which stores the direction of the previous
move along the border from the previous border element to the current
border element. Assign
(a) dir = 3 if the border is detected in 4-connectivity (Figure 6.32 (a))
(b) dir = 7 if the border is detected in 8-connectivity (Figure 6.32 (b))
2. Search the 3x3 neighborhood of the current pixel in an anti-clockwise
direction, beginning the neighborhood search in the pixel positioned in
the direction
(a) (dir + 3) mod 4
(b) (dir + 7) mod 8 if dir is even
(dir + 6)mod8 if dir is odd
The rst pixel found with the same value as the current pixel is a new
boundary element P
n
. Update the dir value.
3. If the current boundary element P
n
is equal to the second border element
P
1
, and if the previous border element P
n1
is equal to P
0
, stop. Otherwise
repeat step 2.
4. The detected inner border is represented by pixels P
0
. . . P
n2
The outer region can be detected using the following modication:
1. Trace the inner region boundary in 4-connectivity until done.
2. The outer boundary consists of all non-region pixels that were tested dur-
ing the search process; if some pixels were tested more than once; they
are listed more than once in the outer boundary list.
65
Figure 6.32: Inner boundary tracing: (a) direction notation, 4-connectivity; (b)
8-connectivity
66
Chapter 7
The Future of RSR/ITS
Probably RSR systems will become a valuable part of upcoming driver support
systems soon. Some experts say that the rst generation of individual driver
information and warning systems will appear on the market in ve to ten years.
Daimler-Chrysler will probably be one of the rst on the market.
The enhancement of future vehicles can be achieved acting both on infras-
tructure and on vehicles. Depending on the specic application, either choice
possesses advantages and drawbacks. Enhancing road infrastructure may yield
benets to transportation architectures based on repetitive and prescheduled
routes, such as public transportation and industrial robotics. On the other
hand, extended road networks for private vehicles would require a complex and
extensive organisation and maintenance which may become cumbersome and
extremely expensive: an ad hoc structuring of the environment can only be
considered for a reduced subset of the road network, for example, a fully auto-
mated highway on which only automatic vehicles- public or private - can drive
The promising results obtained in the rst stages of research on intelligent
vehicles demonstrate that a full automation of trac (at least on motorways
or suciently structured roads) is technically feasible. Nevertheless, besides
technical problems, some issues must be carefully considered in the design of
these systems such as legal aspects related to the responsibility in case of faults
and incorrect behaviour of the system, and the impact of automatic driving on
human passengers. User acceptance in particular will play a critical role in how
intelligent vehicles will look and perform and the system interface will have a
strong inuence on how a user will view and understand the functionality of the
system.
Therefor a long period of exhaustive tests and renement must precede the
availability of these systems on the general market, and a fully automated high-
way system with intelligent vehicles driving and exchanging information is not
expected for a couple of decades.
For the time being, complete automation will be restricted to special in-
frastructures such as industrial applications or public transportation. Then,
automatic vehicular technology will be gradually extended to other key trans-
portation areas such as goods shipping, for example on expensive trucks, where
the cost of an autopilot is negligible with respect to the cost of the vehicle it-
self and the service it provides. Finally once technology has stabilised and the
most promising solution and best algorithms xed, a massive integration and a
67
widespread use of such systems will take place in private vehicles, but this will
not happen for another two or more decades.
68
Bibliography
[1] C. Little, The intelligent vehicle initiative: Advancing Human-Centered
smart vehicles, Public Roads Mag., vol 61, no 2, pp 18-25 Sept/Oct 1997
[2] M. Cellario, Human-Centered intelligent vehicules: Toward multimodal in-
terface integration IEEE Intell. Syst., vol. 16, no. 4, pp 78-81, July/Aug.
2001.
[3] Vehicle-highway automation activities in the United States, U.S. Dept of
Transportation, 1997
[4] H. Tokuyama, Asia-pacic projects status and plans U.S. Dept of Trans-
portation
[5] M.C. Hulse et al., Development of human factors guidelines for advanced
traveler information systems and commercial vehicle operations: Identi-
cation of the strengths and weakness of alternative information display
formats. Federal Highway Administration, Washington, DC, Tech. Rep.
FHWA-RD-96-142, 1998
[6] U. Seger, H.G. Graf and M.E. Landgraf, Vision assistance in scences with
extreme contrast, IEEE Micro, vol. 13, pp. 50-56, jan/feb 1993
[7] C.G. Sodini and S.J. Decker, A 256x256 CMOS Brightness adaptive imag-
ing array with column-parallel digital output, in Proc IEEE Int. Conf.
Intelligent Vehicles, 1998, pp.347-352
[8] M.Mizuno, K.Yamada, T.Nakano, and S.Yamamoto, Robustness of lane
mark detection with wide dynaimc range vision sensor, in Proc IEEE Int.
Conf. Intelligent Vehicles, 1995, pp.171-176
[9] R Chapius, R. Aufrre, F.Chuasse, and J. Alizon, Road sides recognition
under unfriendly lighting conditions, in Proc. IEEE IV, 2001, pp. 13-18
[10] M.Ltzeler and E.D. Dickmanns, Road recognition with MarVEye, in Proc,
IEEE IV, 1998, pp. 341-346
[11] R.Risack, P.Klausmann, W. Kruger and W.Enkelmann, Robust lane recog-
nition embedded in a real-time driver assistance system. In Proc IEEE IV,
1998, pp 35-40.
[12] E.D. Dickmanns, Expectation-based multi focal vision for vehicle guidance,
in Proc 8th Eur. Signal Processing Conf., 1995, pp.1023-1026
69
[13] T.M. Jochem and S.Baluja, A massive parallel road follower, in Proc, IEEE
Computer Architectures for Machine Perception, M.A.Bayoumi, L.S. Davis,
and K.P. Valavanis, Eds., 1998, pp 2-12
[14] D.M. Gavrila, Multi-feature Hierarchical Template Matching Using Dis-
tance Transforms, Proc IEEE International Conference on Pattern Recog-
nition, Brisbane, Australia, 1998
[15] V. Rehrmann and Lutz Priese, Fast and Robust Segmentation of Natu-
ral Color Scenes, Image Recogntion Lab, University of Koblenz-Landau,
Germany
[16] A. Broggi, G. Conte, F. Gregoretti, C.Sanso, and L.M. Reyneri, The evo-
lution of the PAPRICA system, Integr, Computer-Aided Eng. J., vol 4, no.
2, pp-114-136, 1997
[17] F.Perez and C.Koach. Toward color image segmentation in analog VLSI:
Algorithm and hardware. Int.J.of Computer Vision, 12(1):17-42, 1994
[18] D.Krumbiegel, K-F Kraiss, and S.Schrieber. A connectionist traf-
c sign recognition system for onboard driver information. In 5th
IFAC/IFIP/IFORS/IEA Symposium on Anlaysis, Design and Evaluation
of Man-Machine Systems 1992, pages 201-206, 1993
[19] M.-P. Dubuisson and A.Jain. 2D matching of 3D moving objects in color
outdoor scenes. In IEEE Int. Conf. CVPR94, pages 887-891, 1994
[20] R. Ghica, S.Lu, and X.Yuan. Recogntion of trac signs using a multilayer
neural network. In Proc. Can Conf. On Electrical and Computer Engineer-
ing, 1994
[21] N.Kehtarnavaz, N.C.Griswold, and D.S.Kang. Stop-sign recognition based
on colour-shape processing. Machine Vision and Applications, 6:206-208,
1993
[22] L.Priese and V.Rehrmann, On Hierarchical color segmentation and appli-
cations. In Proc, CVPR 1993, pages 633-634, 1993
[23] L.Priese, J.Klieber, R.Lakmann, V.Rehrmann and R.Schian. New results
on trac sign recogntition. In IEEE Proc. Intelligent Vehicles94 Sympo-
sium, pages 249-253, 1994
[24] G. Nicchiotti, E. Ottaviani, P.Castello, and G.Piccioli. Automatic road sign
detection and classication from color image sequences. In S. Impedovo,
editor, Proc 7th Int. Conf. On Image Analysis and Processing, pages 623-
626- World Scientic, 1994
[25] D.Kellmeyer and H.Zwahlen. Detection of highway warning signs in natural
video images using color image processing and neural networks. In IEEE
Proc. Int. Conf.Neural Networks 1994, volume 7, pages 4226-4231, 1994
[26] M. De Saint Blancard. Road sign recognition: A study of vision-based
decision making for road environment recognitioin. In Vision-based vehicle
Gidance, Springer Series in Perception Engineering. Springer-Verlag, 1992
70
[27] B.Besserer, S.Estable, and B.Ulmer. Multiple knowledge sources and evi-
dential reasoning for shape recognition. In Proc. IEEE 4th Conference on
Computer Vision, pages 624-631, 1993
[28] P.Suetens, P.Fuea, and A.J. Hanson. Computational strategise for object
recognition. ACM Computing Surveys, 24(1):5-61, March 1992
[29] Y.Lamdan and H. Wolfson. Geometric hashing: a general and ecient
model-based recogntition scheme. In Proc. 2nd Int. Conf. On Computer
Vision, pages 238-249. IEEE, 1988
[30] G.Piccoli, E.D. Michelli and M.Campani. A robust method for road sign
detection and recognition. In Proc. European Conference on Computer Vi-
sion 1994, pages 495-500, 1994
[31] H.Akatsuka and S. Imai. Road signposts recognition system. In Proc. SAY
vehhicle highway infrastructure: safety compatibility, pages 189-196, 1987
[32] S.Estable, J.Schick, F.Stein, R.Janssen, R.Ott, W.Ritter, and Y.-J. Zheng.
A real-time trac sign recognition system. In Proc. Intelligent Vehicles94,
pages 213-218, 1994
[33] G.Borgefors. Hierarchical chamfer matching: A parametric edge matching
algorithm. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 10(6):849-865, November 1988
[34] G.Piccioli Robust method for road sign detection and recognition. Image
and vision Computing 14:209-223, 1996
[35] D.Rogahn, Road Sign Detection Using Neural Networks,
http://www.d.umn.edu/ drogahn/road sign detection.pdf, 1999-2000
[36] Margrit Betka, Nicholas Makris, Fast Object Recog-
nition in Noisy Images Using Simulated Annealing,
http://www.umiacs.umd.edu:80//users/betke/iccv95.ps
[37] Cleopatra project at PAC (Parallel Application Center), on-line:
http://www.pac.soton.ac.uk/
[38] Kotek Z. Marik V. a kol. Metody rozpoznavani a jejich aplikace. Academia,
1993. ISBN 80-200-0297-9
[39] Duin R.P.W. Introduction to Statistical Pattern Recognition. Pattern
Recognition Group, Delft University of Technology, 1997
[40] P.Paclik The Automated Classication of Road Signs. Faculty of trans-
portation science, Czech Technical university, Prague, 1998
[41] Pratt W.K. Digital Image Processing. John Wiley and Sons, New York,
1978
[42] Merzios B.G. and Tsirikolias K. Statistical pattern recognition using e-
cient two-dimensional moments applications to character recognition. Pat-
tern Recognition, 1993. Vol.26, No.6, pp.877-882
71
[43] Flusser J. and Suk T. Pattern Recognition by Ane Moment Invariants.
Patterns Recognition, 1993. Vol.26, pp.167-174
[44] Oivind Due Trier, Anil K. Jain, Tornn Taxt. Feature extraction methods
for character recognition - a survey, 1995.
[45] M. Pietikainen and D. Harwood. Segmentation of color images using edge-
preserving lters. In V. Cappellini and R. Marconi, editors, Advances. Im-
age Processing and Pattern Recognition, pages 94-99. North-Holland, 1986.
[46] V. Rehrmann. Stabile, echtzeitfahige Farbbildauswertung. Verlag Folbach,
Koblenz, 1994.
[47] M.M. Zadeh, T.Kasvand, C.Y.Suen. Localization and Recognition of Trac
Road Signs for Automated Vehicle Control Concordia University, Spies
Intelligent system and automated manufacturing, 1998.
[48] Rafael C. Gonzales, Richard E. Woods. Digital Image Processing, Prentice
Hall, 2002 ISBN 0-13-094650-8.
[49] Metro, Monday 9 december 2002, pp. 20
[50] DaimlerChrysler, The Thinking Vehicle,
web:http://www.daimlerchrysler.com, 2002
[51] M.Sonka, V.Hlavac, R.Boyle, Image Processing, Analysis, and Machine
Vision, 2nd. Edition, 1998, ISBN 0-534-95393-X
72

Anda mungkin juga menyukai