Anda di halaman 1dari 135

Autonomous, Wireless Sensor Network-Assisted

Target Search and Mapping


by
Steffen Beyme
Dipl.-Ing. Electrical Engineering, Humboldt-Universitt zu Berlin, 1991
A THESIS SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
Doctor of Philosophy
in
THE FACULTY OF GRADUATE AND POSTDOCTORAL
STUDIES
(Electrical and Computer Engineering)
The University Of British Columbia
(Vancouver)
October 2014
c _Steffen Beyme, 2014
Abstract
The requirements of wireless sensor networks for localization applications are
largely dictated by the need to estimate node positions and to establish routes to
dedicated gateways for user communication and control. These requirements add
signicantly to the cost and complexity of such networks.
In some applications, such as autonomous exploration or search and rescue,
which may benet greatly from the capabilities of wireless sensor networks, it
is necessary to guide an autonomous sensor and actuator platform to a target, for
example to acquire a large data payload from a sensor node, or to retrieve the target
outright.
We consider the scenario of a mobile platform capable of directly interrogating
individual, nearby sensor nodes. Assuming that a broadcast message originates
from a source node and propagates through the network by ooding, we study
applications of autonomous target search and mapping, using observations of the
message hop count alone. Complex computational and communication tasks are
ofoaded from the sensor nodes, leading to signicant simplications of the node
hardware and software.
This introduces the need to model the hop count observations made by the mo-
bile platform to infer node locations. Using results from rst-passage percolation
theory and a maximum entropy argument, we formulate a stochastic jump process
which approximates the message hop count at distance r from the source. We show
that the marginal distribution of this process has a simple analytic form whose pa-
rameters can be learned by maximum likelihood estimation.
Target search involving an autonomous mobile platformis modeled as a stochas-
tic planning problem, solved approximately through policy rollout. The cost-to-go
ii
at the rollout horizon is approximated by an open-loop search plan in which path
constraints and assumptions about future information gains are relaxed. It is shown
that the performance is improved over typical information-driven approaches.
Finally, the hop count observation model is applied to an autonomous mapping
problem. The platform is guided under a myopic utility function which quanties
the expected information gain of the inferred map. Utility function parameters are
adapted heuristically such that map inference improves, without the cost penalty of
true non-myopic planning.
iii
Preface
Chapters 2 to 4 are based on manuscripts that to date have either been published, or
accepted or submitted for publication, in peer-reviewed journals and conferences.
All manuscripts were co-authored by the candidate as the rst author, with revi-
sions and comments by Dr. Cyril Leung. In all these works, the candidate had the
primary responsibility for conducting the research, the design and performance of
simulations, results analysis and preparation of the manuscripts, under the supervi-
sion of Dr. Cyril Leung. The following list summarizes the publications resulting
from the candidates PhD work:
S. Beyme and C. Leung, Modeling the hop count distribution in wireless
sensor networks, Proc. of the 26th IEEE Canadian Conference on Electri-
cal and Computer Engineering (CCECE), pages 16, May 2013.
S. Beyme and C. Leung, A stochastic process model of the hop count dis-
tribution in wireless sensor networks, Elsevier Ad Hoc Networks, vol. 17,
pages 6070, June 2014.
S. Beyme and C. Leung, Rollout algorithm for target search in a wireless
sensor network, Proc. of the IEEE 80th Vehicular Technology Conference,
Sept. 2014. Accepted.
S. Beyme and C. Leung, Wireless sensor network-assisted, autonomous
mapping with information-theoretic utility, 6th IEEE International Sym-
posium on Wireless Vehicular Communications, Sept. 2014, Accepted.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Organization and Contributions . . . . . . . . . . . . . . . 4
2 Stochastic Process Model of the Hop Count in a WSN . . . . . . . . 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Motivation and Related Work . . . . . . . . . . . . . . . 7
2.1.2 Chapter Contribution . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . 9
2.2 Wireless Sensor Network Model . . . . . . . . . . . . . . . . . . 9
2.2.1 Stochastic Geometry Background . . . . . . . . . . . . . 9
2.2.2 First-Passage Percolation . . . . . . . . . . . . . . . . . . 11
v
2.3 Stochastic Process Model for the Hop Count . . . . . . . . . . . . 14
2.3.1 Jump-type Lvy Processes . . . . . . . . . . . . . . . . . 14
2.3.2 Maximum Entropy Model . . . . . . . . . . . . . . . . . 17
2.3.3 Maximum Likelihood Fit of the Hop Count Process . . . . 19
2.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Hop Count Distribution . . . . . . . . . . . . . . . . . . . 22
2.4.3 Localization of Source Node . . . . . . . . . . . . . . . . 23
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Rollout Algorithms for WSN-assisted Target Search . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Background and Motivation . . . . . . . . . . . . . . . . 34
3.1.2 Chapter Contribution . . . . . . . . . . . . . . . . . . . . 36
3.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . 37
3.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Wireless Sensor Network Model . . . . . . . . . . . . . . 37
3.2.2 Autonomous Mobile Searcher . . . . . . . . . . . . . . . 39
3.2.3 Formulation of Target Search Problem . . . . . . . . . . . 40
3.3 Approximate Online Solution of POMDP by Rollout . . . . . . . 44
3.3.1 Rollout Algorithm . . . . . . . . . . . . . . . . . . . . . 44
3.3.2 Parallel Rollout . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Heuristics for the Expected Search Time . . . . . . . . . . . . . . 45
3.4.1 Constrained Search Path . . . . . . . . . . . . . . . . . . 46
3.4.2 Relaxation of Search Path Constraint . . . . . . . . . . . 46
3.5 Information-Driven Target Search . . . . . . . . . . . . . . . . . 50
3.5.1 Mutual Information Utility . . . . . . . . . . . . . . . . . 50
3.5.2 Infotaxis and Mutual Information . . . . . . . . . . . . . 51
3.6 A Lower Bound on Search Time for Multiple Searchers . . . . . . 52
3.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 53
3.7.2 Idealized Observations . . . . . . . . . . . . . . . . . . . 55
3.7.3 Empirical Observations . . . . . . . . . . . . . . . . . . . 56
vi
3.8 Statistical Dependence of Observations . . . . . . . . . . . . . . 57
3.8.1 Mitigation of Observation Dependence . . . . . . . . . . 59
3.8.2 Explicit Model of Observation Dependence . . . . . . . . 61
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 WSN-assisted Autonomous Mapping . . . . . . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Background and Motivation . . . . . . . . . . . . . . . . 68
4.1.2 Chapter Contribution . . . . . . . . . . . . . . . . . . . . 70
4.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . 70
4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.1 Wireless Sensor Network Model . . . . . . . . . . . . . . 70
4.2.2 Autonomous Mapper . . . . . . . . . . . . . . . . . . . . 72
4.3 Mapping Path Planning . . . . . . . . . . . . . . . . . . . . . . . 74
4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 76
4.4.2 Simulation of Map Inference . . . . . . . . . . . . . . . . 77
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 81
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.1 Parametric Models of the Hop Count Distribution . . . . . 84
5.2.2 Statistical Dependence of Hop Count Observations . . . . 85
5.2.3 Simulation-based Observation Models . . . . . . . . . . . 85
5.2.4 Multi-Modal Observations . . . . . . . . . . . . . . . . . 85
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A Necessary Condition for the Hop Count Process . . . . . . . . . . . 97
B Proof of Strong Mixing Property . . . . . . . . . . . . . . . . . . . . 99
vii
C Constrained-Path Search as Integer Program . . . . . . . . . . . . . 102
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
C.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
C.2.1 Minimizing the Expected Search Time . . . . . . . . . . . 104
C.2.2 Maximizing the Detection Probability . . . . . . . . . . . 106
C.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 107
C.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
D Infotaxis and Mutual Information . . . . . . . . . . . . . . . . . . . 114
D.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
D.2 Proof of Equivalence . . . . . . . . . . . . . . . . . . . . . . . . 114
D.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
E Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
viii
List of Tables
Table 4.1 Map average entropy and MSE, w
i
adapted according to (4.16) 78
Table 4.2 Map average entropy and MSE, w
i
adapted according to (4.17) 78
ix
List of Figures
Figure 2.1 Realization of a random geometric graph . . . . . . . . . . . 10
Figure 2.2 CDFs of the translated Poisson model and the empirical hop
count in a 2D network, for mean node degree 8 . . . . . . . . 25
Figure 2.3 CDFs of the translated Poisson model and the empirical hop
count in a 2D network, for mean node degree 16 . . . . . . . 26
Figure 2.4 CDFs of the translated Poisson model and the empirical hop
count in a 2D network, for mean node degree 40 . . . . . . . 27
Figure 2.5 Kullback-Leibler divergence (KLD) between empirical distri-
bution and translated Poisson distribution . . . . . . . . . . . 28
Figure 2.6 CDFs of the translated Poisson model and the empirical hop
count in a 1D network, for mean node degree 8 . . . . . . . . 29
Figure 2.7 CDFs of the translated Poisson model and the empirical hop
count in a 1D network, for mean node degree 16 . . . . . . . 30
Figure 2.8 CDFs of the translated Poisson model and the empirical hop
count in a 1D network, for mean node degree 40 . . . . . . . 31
Figure 2.9 CDFs of the normalized localization error, given 8 hop count
observations, for mean node degrees 8 and 40. . . . . . . . . . 32
Figure 3.1 Dynamic Bayes Network representing a POMDP . . . . . . . 42
Figure 3.2 CDF of the search time for rollout under random action selec-
tion, compared to the search time for myopic mutual informa-
tion utility. Rollout horizon H = 4, hop counts generated by
model M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
x
Figure 3.3 CDF of the search time for rollout under random action se-
lection, compared to the search time for non-myopic mutual
information utility. Rollout horizon H = 4, hop counts gener-
ated by model M3 . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 3.4 CDFs of the search time for rollout under 3 different base poli-
cies: random, constant and greedy action selection. Rollout
horizon H = 4, hop counts generated by model M3 . . . . . . 59
Figure 3.5 CDF of the search time for rollout under random action, com-
pared to 2 parallel rollout approaches: random and constant
action selection, random and greedy action selection. Rollout
horizon H = 4, hop counts generated by model M3 . . . . . . 60
Figure 3.6 CDF of the search time for rollout under random action, com-
pared to 2 parallel rollout approaches: random and constant
action selection, random and greedy action selection. Rollous
horizon H = 2, hop counts generated by model M3 . . . . . . 61
Figure 3.7 CDFs of the search time for rollout under random action, with
horizon H = 4, for the 3 hop count generation models M1, M2
and M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure 3.8 CDFs of the search time for rollout under random action, with
horizon H = 4 and for the 3 hop count generation models M1,
M2 and M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 3.9 Deviation of local average hop count from the model mean . . 65
Figure 3.10 Correlation between hop count observations, as a function of
the distance from the source node and the inter-observation
distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 4.1 True map, inferred map and autonomous mapper path . . . . . 79
Figure 4.2 Average entropy per map element . . . . . . . . . . . . . . . 80
Figure 4.3 Mean square error between true and inferred map . . . . . . . 80
Figure C.1 Search path of minimum expected search time . . . . . . . . . 109
Figure C.2 Search path of maximum probability of detection . . . . . . . 110
xi
Figure C.3 Cumulative probability of detection for the two search policies
of minimum expected search time and maximum probability
of detection, and for the optimal search without path constraint 111
Figure C.4 Search path for unpenalized objective function . . . . . . . . 112
Figure C.5 Cumulative probability of detection for unpenalized objective
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
xii
List of Acronyms
The following acronyms are frequently used in this thesis:
KLD Kullback-Leibler divergence
MLE Maximum likelihood estimation
MSE Mean square error
POMDP Partially observable Markov decision process
WSN Wireless sensor network
xiii
Acknowledgments
The experience of graduate school at UBC has been rewarding and fullling, and I
owe a debt of gratitude to many people here, whose support made this possible.
I reserve special thanks for my thesis supervisor and mentor, Dr. Cyril Leung,
whom I have had the privilege to work with. His invaluable insight and continued
encouragement have been an inspiration throughout this journey of thesis research.
He let me explore with great freedom and offered the guidance and the support
without which this work would not have progressed to this point.
I would like to thank my supervisory committee members, Dr. Vikram Krish-
namurthy and Dr. Z. Jane Wang, for their advice and much appreciated feedback.
I would also like to thank the faculty, staff and fellow students in the Depart-
ment of Electrical and Computer Engineering, who all contributed to create a stim-
ulating research environment.
This work was supported in part by the Natural Sciences and Engineering Re-
search Council (NSERC) of Canada under Grants OGP0001731 and 1731-2013
and by the UBC PMC-Sierra Professorship in Networking and Communications.
Finally, I owe enduring gratitude to my parents, for the love and encouragement
they have provided. My greatest thanks go to my wife, Beatriz, and to our children,
Carl and Alex, whose love and patience have let me see this thesis through.
xiv
To my Family
xv
Chapter 1
Introduction
1.1 Background and Motivation
Both target localization and mapping are important applications of wireless sensor
networks (WSNs) [80] as well as autonomous robotics [110]. Much research con-
tinues to be devoted to advance the state-of-the-art in these elds and to expand and
enhance the operational capabilities of these complementary technologies. This
thesis considers the joint application of wireless sensor networks and autonomous
robotics.
Some applications, such as autonomous exploration or search and rescue, would
benet fromboth the pervasive sensor coverage which only WSNs can provide, and
the mobility of a single or multiple autonomous sensor and actuator platforms. In
a joint application, a typical mission would involve the ad hoc deployment of a
WSN (due to the circumstances of the mission, often in a random manner) and an
autonomous platform able to locate one or several targets, or inferring a map,
by interacting with the WSN. The objective can range from collecting a large data
payload from a sensor node that has reported an event of interest (but energy con-
straints prevent the actual, recorded observation data frombeing forwarded through
the wireless sensor network), moving more sophisticated sensing or actuating ca-
pabilities closer to the site of interest (as in planetary exploration), to retrieval of
the target outright. A survey of related applications at the intersection of WSNs
and autonomous robotics can be found in [91]. A method for localization and
1
autonomous navigation using WSNs was recently described in [22].
Many WSNs used for localization are organized around the concept of location-
aware sensor nodes which forward individual target location estimates to one or
several sink nodes, using multi-hop communication. There can be varying degrees
of local cooperation between nodes to enhance the sensing and localization perfor-
mance [80]. The purpose of the sinks is to aggregate, or fuse, information from
multiple sensor nodes to improve the location estimates and ultimately, to serve
as gateways through which a user communicates with the WSN. The sinks can
be dedicated nodes, or may be dynamically selected from the population of sen-
sor nodes (for example based on the available amount of energy) to act as fusion
centers. Node location awareness can be addressed by the use of special-purpose
localization hardware (e.g. GPS or time/angle of arrival) and associated protocols.
The need to establish node positions and to maintain multi-hop routes from the
sensor nodes to the statically or dynamically assigned fusion centers thus drives
many of the hardware and communication requirements of WSNs and contributes
signicantly to their cost and complexity.
As an approach to reduce the cost and complexity of WSNs used by the ap-
plications considered in this thesis, we assume that the sensor nodes are location-
agnostic and that an autonomous platform, assumed to be location-aware, acts as a
mobile sink by directly interrogating nearby sensor nodes. Instead of discovering
and maintaining routes to the sink, a simple message broadcast protocol is respon-
sible for information dissemination in the WSN. As a consequence, the need for
special-purpose hardware to support node localization, as well as complex routing
strategies and dedicated sink nodes for performing in-situ sensor fusion, is elimi-
nated. By ofoading expensive computational and communication tasks from the
sensor nodes to the autonomous platform, a signicant simplication of the sen-
sor node hardware and software requirements can be achieved. In this thesis, only
observations of the hop count of a message originating from a source node are
assumed to be available to the autonomous platform, to infer the location of the
source node.
Although we do not pursue the concept further in this thesis, an autonomous
sensor platform allows for the seamless integration of WSN hop count observations
with the platforms on-board sensing capabilities, which are often sophisticated and
2
complementary to the WSN and may include laser or ultrasound ranging, imaging
etc. Similarly, it is conceivable to use hop count based localization in combination
with other low-cost methods of WSN node localization, to obtain overall improved
node location estimates. This includes for example methods based on received
signal strength (RSS), which add little or no extra cost to the sensor nodes [12].
The joint application of WSNs and autonomous platforms for target search and
mapping raises the need for an observation model which relates the statistics of the
hop count of a broadcast message to the distance from the source node in an appro-
priately dened WSN. This problemis central to localization methods referred to as
range-free [102]. However, for many reasonable models of WSNs, the character-
ization of the hop count statistics, given the source-to-sink distance, remains a chal-
lenging, open problem for which only approximations are known, which in many
cases can only be evaluated at signicant computational cost [24, 72, 82, 107].
Under certain assumptions, which include linear observations and a Gaussian error
model, the Kalman lter is optimal for localization (these assumptions are some-
what relaxed in the extended Kalman lter) [57]. However, the hop count obser-
vations in a WSN are not well characterized by this model, generally requiring
nonparametric Bayesian methods to compute the a posteriori probability density
function of the target. Typically performed by grid or particle lters, these methods
require a large number of numerical evaluations of the observation model [110]. It
is therefore important, to develop models that have low computational complexity
and characterize the hop count reasonably well. This is the subject of Chapter 2 of
this thesis.
The search for a (generally moving) target by an autonomous platform based
on observations of the hop count of a broadcast message can be described as a
stochastic planning problem. The general framework for this type of problem is
the partially observable Markov decision process (POMDP) [69]. Unfortunately,
for most problems of practical relevance, solving the POMDP exactly is compu-
tationally intractable. This has given rise to the need for approximate, suboptimal
solutions which can achieve acceptable performance, many of which are based on
online Monte Carlo simulation [85, 106]. However, even suboptimal planning al-
gorithms can still present formidable computational challenges. In this thesis, we
propose the use of an efcient heuristic to limit the computational cost of a policy
3
rollout algorithm [8, 17], which is the subject of Chapter 3. The computational
requirements of Monte Carlo methods such as policy rollout further magnify the
need for observation models of low complexity.
Closely related to localization is autonomous mapping [25, 108], which we
consider in Chapter 4. Autonomous mapping platforms are typically equipped with
sensors such as range and direction nders or cameras, and are therefore geared
towards the mapping of physical objects (or obstacles) in the environment. In con-
trast, the pervasive sensor coverage of WSNs enables the dense mapping of quan-
tities such as the concentration of chemicals, vibrations and many others, whose
measurement requires close physical contact with the sensor. With WSN-assisted
mapping, the data association problems [110] inherent in many mapping applica-
tions can be sidestepped quite easily. Another difcult problem in autonomous
mapping is the planning of an optimal path along which observations are made,
such that the mean error between the true and the inferred map is minimized, usu-
ally over a nite time horizon. Due to the curse of dimensionality, this problem is
generally intractable and good approximate techniques to nd a near-optimal path
are required for any practical application.
1.2 Thesis Organization and Contributions
Thesis Organization
The subject of Chapter 2 is the derivation of an observation model, which relates
the hop count distribution of a broadcast message in a suitably dened WSN to the
source-to-sink distance. We evaluate the model by comparison with the empirical
hop count in a simulated WSN. This model is the basis for Chapter 3, in which we
study target search by an autonomous platform as a stochastic planning problem,
and use Monte Carlo techniques to solve it approximately. In Chapter 4, we study
the problemof path planning for an autonomous platformwhich relies on hop count
observations from the WSN to infer the map of sensor measurements. As in the
preceding chapter, approximate solution methods are developed as the path plan-
ning problem is generally intractable. Finally, Chapter 5 summarizes conclusions
from this work and provides a few suggestions for future research.
4
Contributions
The thesis makes the following contributions:
In Chapter 2, we formulate a stochastic jump process whose marginal dis-
tribution has a simple analytical form and models the hop count of the rst-
passage path from a source to a sink node.
In Chapter 3, we model the target search as an innite-horizon, undiscounted
cost, online POMDP [69] and solve it approximately through policy rollout
[8]. The terminal cost at the rollout horizon is described by a heuristic based
on a relaxed, optimal search problem.
We show that a target search problem described in terms of an explicit trade-
off between exploitation and exploration (referred to as infotaxis [112]), is
mathematically equivalent to a target search with a myopic mutual informa-
tion utility.
A lower bound on the expected search time for multiple uniformly dis-
tributed searchers is given in terms of the searcher density, based on the
contact distance [4] in Poisson point processes.
We propose an integer autoregressive INAR(1) process [1] for translated
Poisson innovations, as a model for the statistical dependence of hop count
observations in a WSN.
In Chapter 4, we propose a myopic, information-theoretic utility function
for path planning in an autonomous mapping application. Utility function
parameters are heuristically adapted to offset the myopic nature of the utility
and achieve improved performance of the map inference.
5
Chapter 2
Stochastic Process Model of the
Hop Count in a WSN
2.1 Introduction
In this chapter, we consider target localization in randomly deployed multi-hop
wireless sensor networks, where messages originating from a sensor node are
broadcast by ooding and the node-to-node message delays are characterized by
independent, exponential random variables. Using asymptotic results from rst-
passage percolation theory and a maximum entropy argument, we formulate a
stochastic jump process to approximate the hop count of a message at distance
r from the source node. The resulting marginal distribution of the process has the
form of a translated Poisson distribution which characterizes observations reason-
ably well and whose parameters can be learned, for example by maximum like-
lihood estimation. This result is important in Bayesian target localization, where
mobile or stationary sinks of known position may use hop count observations, con-
ditioned on the Euclidean distance, to estimate the position of a sensor node or
an event of interest within the network. For a target localization problem with a
xed number of hop count observations, taken at randomly selected sensor nodes,
simulation results show that the proposed model provides reasonably good location
error performance, especially for densely connected networks.
6
2.1.1 Motivation and Related Work
Target localization in wireless sensor networks (WSNs) is an active area of re-
search with wide applicability. Due to power and interference constraints, the vast
majority of WSNs convey messages via multiple hops from a source to one or
several sinks, mobile or stationary. Localization techniques which exploit the in-
formation about the Euclidean distance from a sensor node, contained in the hop
count of a message originating from that node, are referred to as range-free [102].
Range-free localization is applicable to networks of typically low-cost, low-power
wireless sensor nodes without the hardware resources needed to accurately mea-
sure node positions, neighbor distances or angles (for example using GPS, time or
angle of arrival). It is therefore an attractive approach in situations where a com-
promise is sought between localization accuracy on one hand, and cost, size and
power efciency on the other.
Various hop-count based localization techniques for WSNs have been pro-
posed; for a survey see e.g. [102]. Relating hop count information to the Eu-
clidean distance between sensor nodes, exemplied by the probability distribution
of the hop count conditioned on distance, remains a challenging problem. Except
in special cases, such as one-dimensional networks [2], only approximations can
be obtained; such approximations are often in the form of recursions, which tend
to be difcult to evaluate [24, 72, 107].
Moreover, the hop count depends on the chosen path from the source to the sink
and is therefore a function of the routing method employed by the network. For ex-
ample, an approximate closed-form hop count distribution is proposed in [82] and
evaluated for nearest, furthest and random neighbor routing, in which a forwarding
node selects the next node from a semi-circular neighborhood oriented towards the
sink, under the assumption that this neighborhood is not empty. Some of the ex-
isting localization algorithms, such as the DV-hop algorithm [77] and its variants,
dene the hop count between nodes as that of the shortest path [24, 72, 107]. Other
localization algorithms use the hop count of a path established through greedy for-
ward routing [20, 59, 76, 113, 117], that is, a path which makes maximum progress
toward the sink with every hop. In most cases, the overhead incurred by establish-
ing and maintaining routes is not negligible. Simpler alternatives may be needed
7
when sensor nodes impose more severe complexity constraints.
This chapter is motivated by the range-free target localization problem in net-
works of position-agnostic wireless sensor nodes, which broadcast messages using
ooding under the assumption that node-to-node message delays can be character-
ized by independent, exponential random variables. This is a reasonable assump-
tion in situations where sensor nodes enter a dormant state while harvesting energy
from the environment and wake up at random times, or when the communication
channel is unreliable and retransmissions are required. Under these conditions, a
rst-passage path emerges as the path of minimum passage time from a source to
a sink. Networks of this type can be described in terms of rst-passage percolation
[23, 33].
Localization of the source node may be performed by mobile or stationary
sinks able to fuse hop count observations Z N to infer the location X R
2
of
the source node, where p
X
(x) denotes the a priori pdf of X. By Bayes rule, the
a posteriori pdf of the source location is p
X
(x[z) p
Z
(z[x) p
X
(x), conditioned on
observing the hop count z at the sink position. Knowledge of the observation model
p
Z
(z[x), that is, the conditional pdf of the hop count, given the source location hy-
pothesis x, is essential for the success of this approach, which may be complicated
further by the presence of model parameters whose values are not known a priori
and must be learned on- or off-line. Bayesian localization involves a large number
of numerical evaluations of the observation model, due to the typically large space
of location hypotheses. This creates a need for observation models with low com-
putational complexity, which may outweigh the need for high accuracy in some
applications.
2.1.2 Chapter Contribution
The main contribution of this chapter is the formulation of a stochastic jump pro-
cess whose marginal distribution has a simple analytical form, to model the hop
count of the rst-passage path from a source to a sink, which is at distance r. In
contrast to earlier works [20, 24, 72, 76, 82, 107, 113, 117], which generally use
geometric arguments to derive expressions for the hop count distribution, our ap-
proach utilizes the abstract model of a jump process and describes the hop count in
8
terms of the marginal distribution of this process. Starting with a stochastic process
of stationary increments satisfying a strong mixing condition, we make a simplify-
ing independence assumption which allows the hop count to be modeled as a jump
Lvy process with drift [6, 29]. We show that, consistent with our assumptions
about the hop count, the maximum entropy principle [52] leads to the selection
of a translated Poisson distribution as the marginal distribution of the hop count
model process.
2.1.3 Chapter Organization
This chapter is structured as follows: in Section 2.2, we review relevant concepts
from stochastic geometry and rst-passage percolation and introduce our wireless
sensor network model. Our main result, the stochastic process Z
r
which models
the observed hop count distribution at distance r from a source node, is derived in
Section 2.3. We describe how the parameters of this process can be learned using
maximum likelihood estimation. In Section 2.4, simulation results are presented
which show that in sensor networks of the type considered in this chapter, the
marginal distribution of the model process approximates the empirical hop count
reasonably well. Furthermore, we study the localization error due to the approx-
imation by comparison with a ctitious, idealized network in which observations
are generated as independent draws from our model. Proofs of propositions used
to derive our model are given in Appendix A and B. This chapter is based on
manuscripts published in [9, 10].
2.2 Wireless Sensor Network Model
2.2.1 Stochastic Geometry Background
The geometry of randomly deployed WSNs is commonly described by Gilberts
disk model [35], a special case of the more general Boolean model [104]. Given
a spatial Poisson point process P

= X
i
: i N of density on R
2
, two sensor
nodes are said to be linked if they are within communication range R of each other.
Gilberts model induces a random geometric graph G
,R
=P

, E
R
with node
set P

and edge set E


R
(Figure 2.1). The node density and the communication
9
range R are related through the mean node degree = R
2
, so that the graph
can be dened equivalently in terms of a single parameter as G

. Without loss of
generality, it is convenient to condition the point process on there being a node at
the origin. By Slivnyaks Theorem [4], if we remove the point at the origin from
consideration, we have a Poisson point process.
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
Figure 2.1: Realization of a random geometric graph G

restricted to [0, 1]
2
Of key interest in the study of G

is continuum percolation [13, 100], that is,


the conditions under which a cluster of innitely many connected nodes emerges
in such a graph, and the probability that the node under consideration (without loss
of generality, the origin) belongs to this cluster. Percolation is widely known to
exhibit a phase transition from a subcritical regime characterized by the existence
of only clusters with a nite number of nodes, to the formation of a single innite
cluster with probability 1, as is increased above a critical threshold
c
. The
percolation probability is dened as the probability that the origin belongs to the
innite cluster. It is zero for values of below the threshold, and an increasing
function of above. The phase transition is interesting in its own right and much
10
research is devoted to the analysis of critical phenomena [100]. Just above the
critical threshold, the incipient innite connected cluster has characteristics which
are generally undesirable for a communication network, such as large minimum
path lengths, which diverge at the critical threshold. A universal property of all
systems exhibiting a phase transition is that characteristic quantities display power-
law divergence near the threshold [100]. Therefore, the mean node degree (or
equivalently, the node density or the transmission range) is generally chosen by the
designer so that the network operates sufciently deep in the supercritical regime,
i.e. it is strongly supercritical. As the simulations will show, the model hop count
distribution is a good approximation of the empirical hop count for large values of
the mean node degree, but deviates more noticeably when the mean node degree
approaches the critical threshold
c
.
2.2.2 First-Passage Percolation
The dissemination of messages in some broadcast WSNs has been described in
terms of rst-passage percolation [23, 33, 34, 64], where every node forwards the
broadcast message to all of its neighbors, so that over time, a message cluster forms
(illustrating the relationship between rst-passage percolation and random growth
models [30, 84]). This type of information dissemination is commonly referred to
as ooding. We assume that the node-to-node message delays can be characterized
by independent, exponentially distributed random variables with a common mean.
This is a reasonable assumption for WSNs in which the nodes enter a dormant state
[23, 39], while replenishing their energy storage by harvesting from unpredictable
environmental sources, and thus transmit at random times. The independence of
the node-to-node delays in the standard rst-passage percolation model implies
that transmissions do not interfere with each other. Various approaches to mitigate
interference exist: for simplicity, we postulate the allocation of orthogonal trans-
mission resources with a sufciently large reuse factor [75], so that interference
can be neglected. In addition, we assume that multiple simultaneous broadcasts
initiated by the same or by different source nodes do not interact, i.e. hop count ob-
servations associated with different broadcasts are independent. Under real-world
conditions, this would not be very efcient and one might consider e.g. schemes
11
of message aggregation. Finally, unless some form of authentication is introduced,
it is conceivable to disrupt such a system intentionally. Messages convey the cu-
mulative number of hops, until observed by a sink. The initial message is tagged
with a unique ID by the source node, so that duplicates can be recognized eas-
ily and discarded by the receiving nodes to prevent unnecessary (and undesirable)
retransmissions.
First-passage percolation was introduced in [41] as a model of e.g. uid trans-
port in random media and has been extensively studied since [47]. In rst-passage
percolation, the edges E of a graph G are assigned independent, identically dis-
tributed non-negative random passage times (e) : e E with the common law
F

. The passage time for a path in G is then dened as the sum of the edge
passage times
T() =

e
(e). (2.1)
A path (x, y) from node x to y, dened by the ordered set of edges (x, v
1
),
(v
1
, v
2
), . . . , (v
n1
, y) has a hop count of [ [ = n. Let (x, y) be the set of all
paths connecting x and y. Then, the rst-passage time from x to y is
T(x, y) = inf
(x,y)
T(). (2.2)
The rst-passage time so dened induces a metric on the graph, with the minimiz-
ing paths referred to as rst-passage paths, or geodesics. Because the edge passage
times are non-negative, geodesics cannot contain loops, that is, they are necessar-
ily self-avoiding paths connecting the end points. Rather than studying the random
geometric graph G

, it is sometimes more convenient to consider the unit distance


graph on the square lattice Z
2
, for example when appealing to translational invari-
ance. In fact, most classical results in rst-passage percolation were established
in the latter setting [41, 95, 96, 115]. On the square lattice, rst-passage time is
often studied between points me
x
and ne
x
on the x-axis with m < n, where e
x
is
the unit vector in the direction of the x-axis, and denoted T
m,n
= T(me
x
, ne
x
). The
12
rst-passage time is a subadditive stochastic process, that is
T
0,n
T
0,m
+T
m,n
whenever 0 < m < n. (2.3)
Under suitable conditions, the rst-passage time is characterized by the asymptotic
time constant, , dened through
lim
n
T
0,n
n
= (F

) = . (2.4)
This limit has been shown to exist almost surely (Kingmans subadditive ergodic
theorem [61], and version with weaker conditions in [65]). Despite much effort,
many open problems remain in rst-passage percolation; in particular, it is not
known if the rst-passage time T
0,n
has a limiting distribution [47].
Compared to the rather extensive study of the rst-passage time constant and
although closely related, the path length, or hop count, has attracted less attention
in the literature on rst-passage percolation. Let N
0,n
denote the hop count of the
rst-passage path between o and ne
x
with rst-passage time T
0,n
(if this path is not
unique, select the one with the smallest hop count to break ties). Suppose now that
all edge passage times (e) are subject to a small perturbation s. Intuitively, the
hop count N
0,n
is found to be the derivative, if one exists, of the rst-passage time
with respect to s [96]
N
0,n
=
d
ds
T
0,n

s=0
(2.5)
Let (F

s)(t) = F

(t s) denote the law of the perturbed edge passage times.


Under the assumption that is bounded below away from zero, i.e. F

(a) = 0 for
some a > 0, it is shown in [41] that the time constant (F

s) has a derivative
with respect to s for almost all s, denoted by
/
(s). Furthermore, if
/
(0) exists, it
is established that
lim
n
N
0,n
n
=
/
(0) in probability. (2.6)
This result implies that number of hops per unit distance is asymptotically con-
stant. The technical assumption that F

(t) is bounded away from zero is naturally


13
satised in the sensor network setting, because message forwarding by any real
sensor node incurs a minimum, non-zero latency.
It is important to point out that the rst-passage path, and therefore the hop
count, is invariant to the common mean of the edge passage times. This can be
proven by a simple change of the unit of time. In turn, the mean edge passage
time may be sensitive to the operating conditions of the WSN, such as the average
energy available from environmental sources. A very desirable consequence is,
that the statistics of the hop count are completely determined by design parameters
and do not depend on operating conditions.
The asymptotically linear relationship between the rst-passage time and the
Euclidean distance from the source node has been demonstrated in [23, 33, 64],
essentially validating (2.4) for the types of WSNs studied by the authors. To the
best of our knowledge, the model presented in this chapter is the rst to express
the hop count of rst-passage paths at distance r from the source as the marginal
distribution of a suitable stochastic jump process.
2.3 Stochastic Process Model for the Hop Count
2.3.1 Jump-type Lvy Processes
Despite much effort, general results for the limiting distribution of the rst-passage
time and by extension, the hop count, remain elusive, among other unsolved prob-
lems in the theory of rst-passage percolation [47]. Moreover, if the objective is to
describe the hop count distribution in WSNs, where the typical distances from the
source can not be characterized as asymptotic, knowledge of limiting distribu-
tions is of little value. Therefore, an attempt is made here to model the hop count
by a stochastic process, motivated by several simplifying assumptions. The result-
ing process is characterized by a simple parametric marginal distribution p
Z
r
(z) for
the hop count Z at distance r. Simulation results are presented in Section 2.4.2 to
validate the model. We also evaluate its performance in a sensor node localization
problem in Section 2.4.3.
Initially, we consider a WSNon the square lattice and specically, the sequence
of sensor nodes located at kne
x
: k 0, where n N denes their spacing. Due
14
to the subadditivity property of the rst-passage time (2.3), we have
T
0,n

1

k=0
T
kn,(k+1)n
, N. (2.7)
Suppose now that we approximate the hop count N
0,n
of the rst-passage path by
a stochastic process of the form

N
0,n
=
1

k=0
N
kn,(k+1)n
, N. (2.8)
A necessary, although not sufcient condition for such a process to approximate
the true hop count without systematic bias is therefore, that N
0,n
is neither sub-
nor superadditive. In Appendix A, we give a proof that the hop count of the rst-
passage path is not subadditive. The proof that the hop count is not superadditive
uses an analogous argument and is therefore is omitted. The process (2.8) has the
following property, a proof of which is given in Appendix B:
Proposition 1. The hop count increments N
0,n
, N
n,2n
, . . . are strongly mixing.
That is to say, the hop count variables N
0,n
and N
kn,(k+1)n
may be considered
independent for k , or asymptotically independent. We use the mixing property
to motivate a simplifying independence assumption for the increments of our hop
count model process. The assumption of i.i.d. edge passage times implies that
the hop count increments are also stationary. Stochastic processes with stationary
and independent increments form the class of processes known as Lvy processes
[6, 50].
Denition 2.1. A stochastic process X
r
is said to be a Lvy process, if it satises
the following conditions:
1. stationary increments: the law of X
r
X
s
is the same as the law of X
rs
X
0
for 0 s < r,
2. independent increments: X
r
X
s
is independent of X
t
, for 0 t s < r,
3. X
0
= 0,
4. sample paths are right continuous with left limits.
15
Among these, jump-type Lvy processes [29] restricted to integer jump sizes
can be considered prototypes for the hop count process. Lvy processes may fur-
thermore include a deterministic component, which in our model represents the
minimum hop count required to reach a sink at distance r > 0 due to the nite,
one-hop transmission range R. The jump component of a Lvy process with inte-
ger jumps is characterized by a Lvy measure [6] of the form
m(x) =

j
(x n
j
), n
j
Z0. (2.9)
The Lvy measure describes the distribution of the jumps of the stochastic process,
in terms of the jump sizes n
j
and the associated intensities
j
. The measure is
bounded, that is

j
<. (2.10)
Later it will become necessary to specify a Lvy measure for our model. For
now, we are interested in the general form of the marginal distribution of the Lvy
process, which can be determined from its characteristic function, given by the
Lvy-Khintchine formula [6]. For a measure of the form (2.9), the characteristic
function is

X
r
(u) = exp
_
r
_
iub+

j
(e
iun
j
1)
_
_
, (2.11)
which describes a compound Poisson process [50]
X
r
= br +
N
r

i=0
Y
i
, (2.12)
where b R denotes the rate of the deterministic, linear drift of the process and N
r
is a Poisson variable with expectation r where =
j
. The random variables Y
i
are independent and identically distributed with f
Y
() =
1
m(x), also known as
the compounding distribution.
16
The expectation of the jump Lvy process is linear in the distance r, as
EX
r
=i
/
X
r
(0) = r
_
b+

j
n
j
_
, (2.13)
which implies, that the average number of hops per unit distance is a constant,
EX
r
r
= b+

j
n
j
. (2.14)
This property of the model process is consistent with the asymptotic behavior of
the hop count in rst-passage percolation (2.6), i.e. for sufciently large distances,
we expect the average hop count to increase linearly in r.
Equation (2.13) imposes constraints on the support and the mean of the distri-
bution of the jump component of the process X
r
, respectively. As the deterministic
component br in (2.13) represents the minimum hop count to reach a node at dis-
tance r, the support of the jump component of the process must be restricted to the
non-negative integers. The mean of the jump component of X
r
is given by (2.13)
as
r
= r
j
n
j
.
2.3.2 Maximum Entropy Model
We now turn to the selection of a specic Lvy measure, whose general structure
is given by (2.9), for the jump component of the process X
r
. This can be viewed
as a model identication problem. Appealing to the law of parsimony (Occams
razor, [42, 70]), this heuristic can be used to select among all possible models
one that is consistent with our knowledge and incorporates the least number of a
priori assumptions or parameters. This criterion suggests the selection of the Lvy
measure m(x) = (x 1), giving rise to the simple Poisson process.
A more powerful argument can be based on the maximum entropy principle,
due to Jaynes [51, 52], which states that among all distributions which satisfy an
a priori given set of constraints representing our knowledge, we should select the
one which is least informative, or more formally, has the maximum entropy subject
to the constraints. The entropy of a probability mass function f (k) = f
k
: k Z
+

17
is dened as [19]
H( f ) =

k=0
f
k
log f
k
. (2.15)
Consider for now the process with the characteristic function

X
r
(u) =
_
p
1e
iu
(1p)
_
r
(2.16)
corresponding to the negative binomial distribution NB(r, p), with support on the
non-negative integers. This distribution is innitely divisible and therefore, it is the
distribution of some Lvy process [50]. If and only if r = 1, the marginal distri-
bution of this process is geometric with parameter p. Among the discrete distribu-
tions supported on Z
+
and subject to a mean constraint, the geometric distribution
is well-known to possess maximum entropy [54]. Consequently, the Lvy process
(2.16) achieves maximum entropy, however only for r = 1.
A maximum entropy Lvy process, on the other hand, corresponds to an in-
nitely divisible distribution which has maximum entropy for all r 0, under
suitable constraints. This condition is satised by the Poisson distribution, and
[54, Theorem 2.5] asserts its maximum entropy status within the class of ultra
log-concave distributions [66], which includes all Bernoulli sums and thus, the bi-
nomial family. This maximum entropy result reinforces the choice of a Poisson
variable Y

r
as the jump component of the Lvy process with drift, such that
X
r
= br +Y

r
. (2.17)
Up to this point, we have ignored that the drift term br is real-valued. In our
application, the drift term represents the minimum hop count to reach a node at
distance r due to the nite transmission radius R and it is appropriate to replace it
by the integer

r
=
_
r
R
_
. (2.18)
We obtain the main result of this chapter, the hop count model process Z
r
: r R
+

18
with mean
r
, as
Z
r
=
r
+Y

r
(2.19)
where
r
is the minimum hop count and Y

r
is a Poisson variable with a mean given
by
r
=
r

r
. The process Z
r
has the marginal probability mass function
p
Z
r
(z) = TPois(z;
r
,
r
) Pois(z
r
;
r
), z
r
(2.20)
which is referred to as the translated Poisson distribution [5].
2.3.3 Maximum Likelihood Fit of the Hop Count Process
The maximum likelihood principle is used to obtain an estimate of a distribution
parameter , dened as the parameter value which maximizes the likelihood func-
tion. The maximum likelihood estimator (MLE) has the desirable property of being
asymptotically efcient, that is, unbiased and approaching the Cramr-Rao lower
bound on the variance of

for increasing sample sizes [58].
The mean (2.13) of the hop count model process Z
r
is assumed to increase
linearly with the distance r from the source node x, at an a-priori unknown rate and
intercept (
1
,
2
) , where =R
+
[1, ), which we want to estimate based
on observations of the hop count at different sites, given x. The mean number of
hops as a function of distance r is approximated as

r
=
_
_
_
0 if r = 0,

1
r +
2
if r > 0.
(2.21)
The intercept
2
1 reects the fact that for any sensor node other than the source,
the hop count must be at least one. Furthermore, at any distance r the hop count
must be greater or equal than the minimum hop count given by (2.18).
An important property of the hop count process is, that
1
,
2
are invariant
to the choice of the mean node-to-node passage time, which can be shown by a
simple change of units of time. The process parameters depend only on the density
and the transmission range of the sensor nodes. This implies that the MLE can
19
be performed off-line at the networks design time, while the mean node-to-node
message delay is allowed to vary (slowly) with operating conditions, such as the
average energy harvested by the sensor nodes.
An observation z with minimumhop count is modeled by a translated Poisson
variable Z with probability mass function (2.20)
p
Z
r
(z) =

z
(z )!
exp(). (2.22)
where = from (2.19). By substituting =
1
r +
2
we obtain the pmf
of the observation, parameterized by the rate
1
and intercept
2
,
p
Z
(z[r; ) =
(
1
r +
2
)
z
(z )!
exp
_
(
1
r +
2
)
_
. (2.23)
We now consider a sample of n independent (though in general, not identically
distributed) observations z
1:n
with joint conditional probability
p(z
1:n
[r
1:n
; ) =
n

i=1
p
Z
(z
i
[r
i
; ). (2.24)
For maximum likelihood estimation problems, it is often convenient to use the log-
likelihood function, dened as the logarithm of (2.24) and interpreted as a function
of the parameter vector given the observations,
([z
1:n
, r
1:n
) = ln p(z
1:n
[r
1:n
; ) (2.25)
=
n

i=1
ln p
Z
(z
i
[r
i
; ). (2.26)
With (2.23), the log-likelihood for our problem becomes
([z
1:n
, r
1:n
) =
n

i=1
_
(z
i

i
)ln(
1
r
i
+
2

i
)
ln(z
i

i
)! (
1
r
i
+
2

i
)

. (2.27)
20
If a maximum likelihood estimate exists, then it is

ML
= argmax

([z
1:n
, r
1:n
). (2.28)
The partial derivatives of the log-likelihood function with respect to are zero at
any local extremum. Hence, the maximum likelihood estimate is a solution of the
system of likelihood equations

1
=
n

i=1
_
z
i

1
r
i
+
2

i
1
_
r
i
= 0

2
=
n

i=1
_
z
i

1
r
i
+
2

i
1
_
= 0. (2.29)
Multiplication with the common denominator transforms (2.29) into a system of
bivariate polynomials in
1
and
2
f (
1
,
2
) = 0
g(
1
,
2
) = 0 (2.30)
which has the same zeros as (2.29). By Bezouts theorem [105], this system has
deg( f ) deg(g) zeros (
1
,
2
) C
2
which can by easily computed by typical nu-
merical solvers even for large polynomial degrees (> 1000), subject to the con-
straint that (
1
,
2
) , where =R
+
[1, ). For every zero satisfying the
constraint, negative deniteness of the Hessian of must be ascertained for a lo-
cal maximum of the likelihood to exist. The maximum likelihood estimate for our
problemis identied with the solution (
1
,
2
) which globally maximizes , subject
to (
1
,
2
) .
2.4 Simulation Results
2.4.1 Simulation Setup
The translated Poisson distribution was applied to model the empirical hop count
observed in simulated broadcast WSNs. For this purpose, our simulation generates
10
3
realizations of random geometric graphs G

on the unit square [0, 1]


2
, where
21
the number of nodes is a Poisson variable with density = 4000 nodes per unit
area. The source node is located at the center, a setup which minimizes bound-
ary effects. We consider mean node degrees ranging from = 8 to = 40, which
are representative of weakly as well as strongly supercritical networks (the criti-
cal mean node degree is
c
=
c
(R)R
2
4.52, based on a value of
c
(1) 1.44
for the empirical critical node density as given in [63] for R = 1). Node-to-node
delays are modeled by independent exponential random variables with unit mean.
The rst-passage paths are computed as the minimum-delay paths from the source
to all nodes within a thin annular region dened by r . The simulation results
are not sensitive to the exact choice of , provided that < r/10. Based on the hop
counts of a sample of 10
3
nodes at random distances from the source, we determine
the model process parameters
1
,
2
using maximum-likelihood estimation, as de-
scribed in Section 2.3.3.
2.4.2 Hop Count Distribution
For selected distances r from the source, Figures 2.2, 2.3 and 2.4 show the cdfs
of the hop count, corresponding to mean node degrees = 8, 16 and 40, respec-
tively. We observe, that the translated Poisson model closely approximates the
empirical hop count in strongly supercritical networks ( = 40, Figure 2.4). For
weakly supercritical networks ( = 8, Figure 2.2), the empirical hop count distri-
bution at larger values of r has a noticeably heavier upper tail compared to the
translated Poisson model. However, the t improves quickly with increasing node
degree ( = 16, Figure 2.3). To assess the goodness of t as a function of mean
node degree and distance r from the source, we compute the Kullback-Leibler
divergence (KLD, [19]) between the empirical and the model distribution. Figure
2.5 indicates that the translated Poisson distribution is an increasingly good t at
larger distances and higher mean node degrees. As the mean node degree varies
from = 8 to = 40, the rate of improvement is large at rst and then gradually
decreases.
While the stochastic process model for the hop count was derived by relying on
properties of two-dimensional networks, which exhibit continuum percolation and
asymptotic properties of the hop count (2.6), the resulting process has no formal
22
dependency on the dimensionality of the problem. It is therefore reasonable to
evaluate the suitability of the translated Poisson distribution as a model for the hop
count in one-dimensional WSNs. To this end, we generate realizations of WSNs on
[0, r], by placing nodes uniformly at random with density =/(2R). The source
node is located at 0. Communication range R and node-to-node delays are dened
as in the two-dimensional setting described earlier. To relax constraints on the rst-
passage path between 0 and r, we extend the network from [0, r] to [A, r +A] (the
simulation results show no sensitivity to this extension for A 2R)
The cdfs shown in Figures 2.6, 2.7 and 2.8, corresponding to mean node de-
grees = 8, 16 and 40, respectively, demonstrate that the translated Poisson dis-
tribution is also a good t for the empirical hop count in one-dimensional WSNs.
Similar to our observations in two-dimensional networks, the t improves as the
distance and mean node degree increase.
2.4.3 Localization of Source Node
In this section, we evaluate the localization error which results fromthe application
of the translated Poisson distribution as a model for the hop count in a WSN. We
use the same network model and simulation setup as described in Section 2.4.2. A
node located at the center x
0
of the unit square [0, 1]
2
is the source of a broadcast
message. Hop count observations are made at randomly chosen nodes in the WSN,
conceivably by a mobile sink which can interrogate a node at its current position s
to obtain the nodes hop count information. The mobile sink, which is assumed to
know its own position, then estimates the location of the source node, given all the
hop count observations.
A second, ctitious WSN serves as a baseline for comparison. In this network,
the hop count at an interrogated node a distance r away from the source node is
generated by drawing from the translated Poisson distribution Z TPois(z
i
; , ),
with =
1
r +
2
(2.21) and = ,r/R| (2.18). These observations are idealized
in the sense that their statistics are characterized exactly by the observation model.
At the mobile sink, let a random variable X [0, 1]
2
describe the (unknown)
location of the source node and z
i
N be the hop count observed at position s
i

[0, 1]
2
, i = 1, . . . , n. The mobile sink applies Bayes rule to compute the posterior
23
pdf of the source node location as
p
X
(x[z
1:n
) = p
X
(x)
n

i=1
TPois(z
i
;
i
,
i
) x [0, 1]
2
, (2.31)
where is a constant to normalize the posterior pdf, and p
X
(x) is the prior distri-
bution of the source location. The parameters of the translated Poisson distribu-
tion are
i
=
1
r
i
+
2
(2.21) and
i
= ,r
i
/R| (2.18), with the Euclidean distance
r
i
=|s
i
x| between the source location hypothesis x and the mobile sink position
s
i
. The parameters
1
,
2
are determined off-line by maximum-likelihood estima-
tion (2.21). In practice, the a posteriori distribution p
X
(x[z
1:n
) is often computed
recursively, i.e. one observation at a time, by a Bayes lter
p
X
(x[z
1:k
) =
k
p
X
(x[z
1:k1
)TPois(z
k
;
k
,
k
) x, (2.32)
where p
X
(x[z
0
) = p
X
(x) is the prior distribution of the source location, which is
uniform in our simulation. In order for the Bayes lter to be computationally
tractable, we discretize the location hypothesis space [0, 1]
2
to a grid of J = 3232
cells x
j
, j =1, . . . , J. This form of the Bayes lter is referred to as a grid lter [110].
Since in a grid lter, the mode of the distribution is necessarily quantized, we use
the posterior mean as our estimate x of the source location,
x =
J

j=1
x
j
p
X
(x
j
[z
1:n
). (2.33)
The a posteriori pdf of the source location tends to become unimodal and symmet-
ric as the number of randomly taken hop count observations increases, so that the
posterior mean is increasingly representative of the maximum a posteriori estimate
(MAP).
In the same fashion, we perform source node localization in the idealized, base-
line WSN and denote the resulting a posteriori pdf of the source location q
X
(x[z
1:n
).
Figure 2.9 shows the cdfs of the normalized localization error
e =| x x
0
|/R (2.34)
corresponding to p
X
(x[z
1:n
) (2.32) and q
X
(x[z
1:n
) with n = 8 randomly chosen ob-
24
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 0.5R
C
D
F
G model
empirical
0 2 4 6 8 10 12
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 1R
G model
empirical
0 5 10 15
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 2R
hop count
C
D
F
G model
empirical
0 5 10 15 20 25 30 35
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 8R
hop count
G model
empirical
hop count
C
D
F
Figure 2.2: Comparison of the translated Poisson model and the empiri-
cal rst-passage hop count (with 95% condence interval) in a two-
dimensional network with mean node degree = 8. CDFs shown for
source-to-sink distances r 0.5R, R, 2R, 8R.
servations, for = 8 and = 40. We observe that the localization error becomes
smaller with increasing mean node degree, . For = 8, the probability that the lo-
calization error exceeds 3R is approximately 0.075. This is likely due to the heavier
upper tail observed in the empirical hop count distribution for weakly supercritical
WSNs, as seen in Figure 2.2. It is known that the robustness of Bayesian inference
suffers, if the distribution of the observation noise has heavier tails than the like-
lihood function modeling it [90]. For = 40, the probability that the localization
error exceeds 3R is less than 0.025.
25
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 0.5R
C
D
F
G model
empirical
0 2 4 6 8 10 12
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 1R
G model
empirical
0 5 10 15
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 2R
hop count
C
D
F
G model
empirical
0 5 10 15 20 25 30 35
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 8R
hop count
G model
empirical
hop count
C
D
F
Figure 2.3: Comparison of the translated Poisson model and the empiri-
cal rst-passage hop count (with 95% condence interval) in a two-
dimensional network with mean node degree = 16. CDFs shown for
source-to-sink distances r 0.5R, R, 2R, 8R.
2.5 Conclusion
In this chapter, we have proposed a new approach to model the hop count distri-
bution between a source and a sink node in broadcast WSNs, in which message
propagation is governed by rst-passage percolation and node-to-node delays are
characterized by i.i.d. exponential random variables. We utilize the abstract model
of a stochastic jump process to describe the hop count. By making a simplifying
independence assumption and using a maximum entropy argument, the hop count
model process is shown to have a translated Poisson marginal distribution. Sim-
ulation results conrm that the empirical hop count distribution is generally well
approximated by this model. The simulation results also show that the error re-
sulting from the application of the translated Poisson model in a target localization
26
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 0.5R
C
D
F
G model
empirical
0 2 4 6 8 10 12
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 1R
G model
empirical
0 5 10 15
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 2R
hop count
C
D
F
G model
empirical
0 5 10 15 20 25 30 35
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 8R
hop count
G model
empirical
hop count
C
D
F
Figure 2.4: Comparison of the translated Poisson model and the empiri-
cal rst-passage hop count (with 95% condence interval) in a two-
dimensional network with mean node degree = 40. CDFs shown for
source-to-sink distances r 0.5R, R, 2R, 8R.
problem is small with high probability, although for node degrees approaching the
critical percolation threshold, the performance degrades. Due to its low computa-
tional complexity, we expect this model to be a good candidate for the observation
model in Bayesian target localization applications in low-cost WSNs which rely on
hop count information alone.
While the translated Poisson distribution is the most parsimonious result con-
sistent with our assumptions about the hop count process, the general form of the
jump component of the process is a compound Poisson distribution which, given
additional information or assumptions, may provide an improved t of the empiri-
cal hop count.
27
0
.
0
0
5
0
.
0
1
0
0
.
0
2
0
0
.
0
5
0
0
.
1
0
0
0
.
2
0
0
0
.
5
0
0
0.5 1.0 2.0 4.0 8.0
normalized distance r/R
K
L
D
G
= 8
= 12
= 16
= 24
= 40
Figure 2.5: Kullback-Leibler divergence (KLD) between empirical distribu-
tion and translated Poisson distribution, for 8, 12, 16, 24, 40 and
at source-to-sink distances r 0.5R, R, 2R, 4R, 8R. The KLD de-
creases with mean node degree .
28
0 1 2 3 4 5 6 7
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 0.5R
C
D
F
G model
empirical
0 2 4 6 8
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 1R
G model
empirical
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 2R
hop count
C
D
F
G model
empirical
0 5 10 15 20
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 8R
hop count
G model
empirical
hop count
C
D
F
Figure 2.6: Comparison of the translated Poisson model and the empiri-
cal rst-passage hop count (with 95% condence interval) in a one-
dimensional network with mean node degree = 8. CDFs shown for
source-to-sink distances r 0.5R, R, 2R, 8R.
29
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 0.5R
C
D
F
G model
empirical
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 1R
G model
empirical
0 2 4 6 8 10 12
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 2R
hop count
C
D
F
G model
empirical
0 5 10 15 20 25
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 8R
hop count
G model
empirical
hop count
C
D
F
Figure 2.7: Comparison of the translated Poisson model and the empiri-
cal rst-passage hop count (with 95% condence interval) in a one-
dimensional network with mean node degree = 16. CDFs shown for
source-to-sink distances r 0.5R, R, 2R, 8R.
30
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 0.5R
C
D
F
G model
empirical
0 2 4 6 8 10 12
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 1R
G model
empirical
0 5 10 15
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 2R
hop count
C
D
F
G model
empirical
0 5 10 15 20 25 30
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
r = 8R
hop count
G model
empirical
hop count
C
D
F
Figure 2.8: Comparison of the translated Poisson model and the empiri-
cal rst-passage hop count (with 95% condence interval) in a one-
dimensional network with mean node degree = 40. CDFs shown for
source-to-sink distances r 0.5R, R, 2R, 8R.
31
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
= 8
C
D
F
G idealized hop count
empirical hop count
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
= 24
C
D
F
G idealized hop count
empirical hop count
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
= 40
C
D
F
G idealized hop count
empirical hop count
normalized localization error e
C
D
F
Figure 2.9: CDFs of the normalized localization error e =| x x
0
|/R, given
hop count observations at 8 randomly selected nodes, for mean node
degrees 8, 24, 40. We compare localization based on empirical
rst-passage hop counts with idealized hop counts (independent draws
from the observation model).
32
Chapter 3
Rollout Algorithms for
WSN-assisted Target Search
3.1 Introduction
An autonomous, mobile platform is tasked with nding the source of a broad-
cast message in a randomly deployed network of location-agnostic wireless sensor
nodes. Messages are assumed to propagate by ooding, with random node-to-node
delays. In WSNs of this type, the hop count of the broadcast message, given the
distance from the source node, can be approximated by a simple parametric dis-
tribution. The autonomous platform is able to interrogate a nearby sensor node to
obtain, with a given success probability, the hop count of the broadcast message.
In this chapter, we model the search as an innite-horizon, undiscounted cost,
online POMDP and solve it approximately through policy rollout. The cost-to-go
at the rollout horizon is approximated by a heuristic based on an optimal search
plan in which path constraints and assumptions about future information gains are
relaxed. This cost can be computed efciently, which is essential for the application
of Monte Carlo methods, such as rollout, to stochastic planning problems.
We present simulation results for the search performance under different base
policies as well as for parallel rollout, which demonstrate that our rollout approach
outperforms methods of target search based on myopic or non-myopic mutual in-
formation utility. Furthermore, we evaluate the search performance for different
33
generative models of the hop count to quantify the performance loss due to the use
of an approximate observation model and the rather signicant effect of statisti-
cal dependence between observations. We discuss how to explicitly account for
dependence by adapting an integer autoregressive model to describe the hop count.
3.1.1 Background and Motivation
Wireless sensor networks (WSNs) have been an object of growing interest in the
area of target localization. The achievable localization accuracy depends on many
factors, among which are the nature of the observed phenomena, sensor modalities,
the degree of uncertainty about sensor node locations as well as processing and
communication capabilities. Typically, sensor observations are processed in situ
and reduced to position estimates, which must be routed through the WSN via
multiple hops to dedicated sink nodes, in order to be accessible by the networks
user. In some applications, for example autonomous exploration or search and
rescue, an essential feature is that the position estimates reported by the WSN are
used to assist and guide a mobile sensor and actuator platform (or searcher, for
short) to the target. The main goal for the searcher is to make contact with the
target, for example to acquire large amounts of payload information from a sensor
that has observed and reported an event of interest, or to retrieve the target outright.
Alternative to the use of WSNs as described, it is conceivable for a mobile
searcher in the deployment area of the WSN to interrogate nearby sensor nodes
directly to gather information, based on which the position of the target can be
estimated. Thereby, expensive computational and communication tasks can be of-
oaded from the sensor nodes. A further possibility is to integrate information
obtained from the WSN with the searchers on-board, perhaps more sophisticated
sensing capabilities, thus mitigating the need for the WSN to perform very precise
localization. This can result in a signicant simplication of the sensor node hard-
ware and software requirements and reduced node energy consumption. Consistent
with this goal, we assume that the sensor nodes are location-agnostic and unaware
of the distances from, or angles between neighbor nodes. The information made
available by the WSN to the searcher is assumed to be statistically related to the
target location, e.g. noisy measurements of the distance.
34
In this chapter, we consider the use of randomly deployed WSNs in which,
starting from a source node, messages are disseminated by ooding and the node-
to-node transmission delays are random. In such a network, provided that the
node density exceeds a critical threshold, a random cluster of all nodes that have
received the broadcast message grows over time, starting with the source node that
initiated the broadcast upon detection of an event of interest, in a process known as
rst-passage percolation [15]. For such a network, it was shown in Chapter 2 that
the message hop count distribution, parameterized by the distance from the source
node, can be approximated by a simple stochastic process model.
We are motivated by the problem of a mobile, autonomous searcher which is
given the task of locating a (generally moving) target, that is, the source node of
a broadcast message in a WSN of the type studied in Chapter 2, relying on ob-
servations of the message hop count alone. We focus on the question of how the
searcher can home in and make contact with the target in the shortest expected
time, given only the hop count observations. Making contact with the target is de-
ned here as observing a hop count of zero. The most general framework for this
type of optimal, sequential decision problem under uncertain state transitions and
state observations is the partially observable Markov decision process (POMDP)
[56, 69, 79, 110]. POMDPs optimally blend the need for exploration to reduce
uncertainty with making progress toward the goal state. Unfortunately, for all but
small problems (in terms of the sizes of the state, action and observation spaces),
solving POMDPs is computationally intractable. This is due to the fact that the
solution is quite general, and must be computed for every possible belief state. The
belief state (or information state) is the POMDP concept of expressing the uncer-
tainty about the true, unobservable state as a probability distribution over all states.
For search problems such as the one considered here, it can be more productive to
consider only the current belief state and plan the search with respect to the actu-
ally reachable belief space. This approach is referred to as online POMDP [85].
Online POMDPs have the additional advantage over ofine solutions of being able
to (quickly) respond to changing model parameter values. However, in many prac-
tical applications, online POMDPs still present a formidable computational chal-
lenge, compounded by the need to operate in real-time. Therefore, online POMDPs
are most often solved approximately and hence, suboptimally [17]. One class of
35
approximate techniques is based on Monte Carlo techniques. Among these, pol-
icy rollout [8, 17] is characterized by a guarantee of improved performance over
the base policy on which the rollout relies. Furthermore, rollout algorithms lend
themselves to parallelization and have the anytime property, which means that the
execution can be stopped at any time, yielding an approximate solution. However,
the accuracy of the solution increases the longer the algorithm is allowed to run.
3.1.2 Chapter Contribution
This chapter makes the following contributions:
1. We propose a rollout-based algorithm which minimizes the expected time-
to-detection, to guide a mobile searcher to a target. The search is modeled as
an innite-horizon, undiscounted cost, online POMDP and solved approx-
imately through policy rollout. As a sample-based method, rollout strictly
applies only to nite-horizon problems. To approximate the terminal cost-
to-go at the rollout horizon, we propose the use of a heuristic based on an
optimal search plan, where search path constraints and assumptions about
future information gains are relaxed. This terminal cost can be computed
efciently in O(nlogn) time, thus making rollout a viable solution approach
for this online planning problem. We show through simulation, that our
rollout approach outperforms popular methods of target search based on a
myopic or non-myopic mutual information utility.
2. We show, that a formulation of the target search problem in terms of an
explicit tradeoff between exploitation and exploration (referred to as info-
taxis [112]), is equivalent to a myopic mutual information utility.
3. An argument based on the contact distance in Poisson point processes moti-
vates the formulation of a lower bound on the expected search time for mul-
tiple searchers, which are assumed to be distributed uniformly at random, in
terms of the searcher density.
4. Simulations of a stationary target search indicate that the statistical depen-
dence between observations can not be ignored without signicant penalty.
36
We discuss mitigating strategies and propose an integer autoregressive pro-
cess as a model of the observation dependence. This model is derived by
adapting the INAR(1) process [1] to translated Poisson innovations.
3.1.3 Chapter Organization
The chapter is organized as follows: In Section 3.2, our system model is intro-
duced. The main contribution, a heuristic for the expected search time at the rollout
horizon, is derived in Section 3.4.2. Because of its use as a reference for perfor-
mance evaluation, myopic and non-myopic mutual information utilities are briey
reviewed in Section 3.5. We present simulation results in Section 3.7 that show the
performance improvement of the rollout algorithm over existing techniques. We
also discuss the loss of performance due to statistical dependence of the empirical
hop count observations, and present mitigating strategies, including a model for
the observation dependence, in Section 3.8. Conclusions for future work are drawn
in Section 3.9. To simplify the notation for probabilities, we omit the names of
random variables when this is unambiguous.
3.2 System Model
3.2.1 Wireless Sensor Network Model
Randomly deployed WSNs are frequently described by Gilberts disk model [35,
40], which we adopt in this chapter. Without loss of generality, the deployment
area is assumed to be the unit square [0, 1]
2
R
2
. Sensor nodes are distributed
according to a spatial Poisson point process P

of density , restricted to the


deployment area. Two sensor nodes are said to be linked if they are within com-
munication range R of each other. The sensor nodes P

and their communication


links E
R
form a random geometric graph G

=P

, E
R
[81] with mean node de-
gree = R
2
. The mean node degree must exceed a critical threshold for a large
portion of the network to be connected.
In order to simulate a mobile searcher making hop count observations while
operating in the deployment area of such a WSN, we discretize both the searcher
position and the sensor node coordinates. Rather than placing sensor nodes ran-
37
domly in [0, 1]
2
according to a Poisson point process, we restrict possible node lo-
cations to a nite set of N = M
2
grid coordinates, C =0, 1/M, . . . , (M1)/M
2
,
indexed by the set X =1, . . . N. We refer to X as the set of cells. A cell is
occupied by a sensor node with the uniform probability p. The resulting random
node placement over C approximates a Poisson point process with density = pN
restricted to [0, 1]
2
, and converges to a Poisson point process in the limit of de-
creasing cell size (this is one way to dene a Poisson point process).
The sensor node in cell x which initiates a message broadcast upon the pre-
sumed detection of an event of interest, is designated as the target. Every node
forwards the message to its neighbors within the communication range R, and du-
plicates of the message are discarded in order to avoid unnecessary retransmissions.
Each node-to-node link has an associated transmission delay. We assume that the
node-to-node delays are independent, exponentially distributed random variables
with a common mean. This is a reasonable assumption for networks where com-
munication is unreliable and retransmissions are required, or where sensor nodes
harvest energy from unreliable environmental sources and may be dormant for ran-
dom periods. Furthermore, we assume that interference between transmissions is
negligible. In such a network, the observed hop count at the node in cell x
/
is the
hop count of the rst-passage path, that is the path

= argmin
(x,x
/
)
T
d
() (3.1)
which minimizes the message transmission delay T
d
() for all paths (x, x
/
)
connecting nodes x and x
/
. In Chapter 2 it was established that for networks with
strongly supercritical mean node degree , the hop count at distance r from the
target is well approximated by a translated Poisson distribution,
f
Z
(z; r) = Pois(z (r); (r)) z (r) (3.2)
where (r) = (r) (r), with a mean hop count given by (r) =
1
r +
2
and
minimum hop count (r) =,r/R|. The parameters
1
,
2
can be learned in various
ways, for example through maximum likelihood estimation as described in Section
2.3.3. It is worth noting that
1
,
2
are invariants of the common mean node-to-
38
node passage time (which in turn may be sensitive to operating conditions, such
as the average available energy), i.e.
1
,
2
only depend on the network design
parameters.
In the simulations, we use different generative models of the hop count. This
enables the independent study of the relative performance of various search algo-
rithms and further effects due to non-ideal observations.
Denition 3.1. We dene the following generative models of the hop count:
M1: observations are the rst-passage hop counts in a WSN realization which is
modeled as a random geometric graph with exponentially distributed node-
to-node passage times, and are referred to as empirical hop counts,
M2: observations are rst-passage hop counts, but are randomly sampled from
multiple, independent realizations of WSNs. Hop counts generated by this
model are referred to as independent, empirical hop counts,
M3: observations are independent draws from the translated Poisson model (3.2),
and are referred to as idealized hop counts.
It was veried through simulations of the generative models M1 and M3 in
Section 2.4.3, that the smallest average target localization error is achieved using
the idealized hop count observations generated by M3. In Section 2.4.3, the obser-
vations are taken at randomly selected nodes of the WSN. With increasing mean
node degree of the WSN, the localization error performance of M1 approaches that
of M3. In this chapter, simulations are designed to evaluate the search-time perfor-
mance for different generative models, whereby the observations are constrained
to be made along the path of the mobile searcher.
3.2.2 Autonomous Mobile Searcher
The mobile searcher acts by moving among the cells dened by the grid coordi-
nates C = c
x

xX
. In each visited cell, the searcher attempts to acquire the hop
count of the broadcast message. If the cell is occupied by a sensor node, the mes-
sage hop count is observed with probability q. Otherwise, the searcher receives
no response at all. The target is said to be detected when a hop count of zero is
39
observed. Time is assumed to be discrete. At every time step k, the searcher may
decide to stay in the cell it is currently visiting, or move to one of the four neighbor
cells. Any searcher action incurs a cost, that is, an increment of search time.
3.2.3 Formulation of Target Search Problem
A POMDP is a general model for the optimal control of systems with uncertain
state transitions and state observations [69, 110]. In target search problems for-
mulated as POMDPs, the target motion is modeled as a Markov chain, searcher
actions may have uncertain effects, and only noisy observations of at least one
state variable are available. Since a focus of this chapter is the study of a hop count
observation model applied to target search, we can restrict attention to a stationary
target and assume, that the searcher position is completely determined by the ac-
tions (that is, the searcher position is known without ambiguity). It is worth point-
ing out that certain instances of search problems can be modeled as multi-armed
bandits, for which index policies exist under suitable conditions [71, 97]. In this
chapter, online methods of policy rollout will be used to compute an approximate,
suboptimal solution for the target search problem.
Any POMDP can be dened in terms of a tuple S, A, T, J, Z, O, where S
is the state space, A the set of admissible actions, T the state transition function, J
the cost function (commonly, the reward function R is specied instead), Z is the
set of observations and O denes the state observation law.
Denition 3.2. The POMDP for the target search problem is dened by
S =X Y is a nite, joint state space, where X is the range of the par-
tially observed target position and Y =X the range of the searcher position.
A state is represented by the vector s = (x, y)
T
.
A =A
y

yY
is a family of action sets indexed by the set of all possible
searcher positions. An action set A
y
stay, north, east, south, west, D de-
nes the possible moves the searcher can make from its current position y in
the next time step, and is augmented by an action D denoting detection. If
the target has not been detected by time k, the searchers next action is con-
strained by a
k+1
A
y
k
D. If the target is detected at time k (by observing
a message hop count of zero), then a
k+n
= D, for all n > 0.
40
T : S A
y
S [0, 1] is the transition kernel describing the state dynam-
ics. In our model, where the target is assumed to be a stationary sensor node
and the searcher position is completely determined, we have
T(s, a, s
/
) PrS
k+1
= s
/
[S
k
= s, a (3.3)
=
_
_
_

x
/
,x

y
/
,y
a = D

x
/
,x

y
/
,y+a
otherwise
(3.4)
J : A
y
R species the cost of executing the action a,
J(a)
_
_
_
0, if a = D
1, if a A
y
(3.5)
Z =N
0
is the set of observations of message hop counts, N
0
, aug-
mented by a possibility of making no observation, denoted .
O: S A
y
Z [0, 1] is the hop count observation model, dened as
O(s, a, z)
_
_
_
1, a = D
PrZ = z[s, otherwise
(3.6)
where
Pr0[s =
_
_
_
q, if x = y
0, otherwise
(3.7)
Pr[s =
_
_
_
1q, if x = y
1qp, otherwise
(3.8)
Prn[s =
_
_
_
0, if x = y
qpf
Z
(n; r(s)), otherwise
(3.9)
for n > 0. Here, f
Z
(n; r(s)) is the translated Poisson distribution (3.2), and
r(s) =|c
x
c
y
|
2
is the Euclidean distance between target and searcher po-
sitions.
41
Figure 3.1: Dynamic Bayes Network representing a POMDP
Since the state is only partially observable, the mobile searcher maintains a
belief state, that is, a joint probability mass function b(s) = PrS = s of the state
variable S, to represent the uncertainty about the true state. Due to independence,
b(s) = b
X
(x)b
Y
(y). The belief state b
k
at time k is a sufcient statistic [110] of all
past history of actions a
1:k
= a
1
, . . . , a
k
and observations z
1:k
= z
1
, . . . , z
k
, given an
a priori belief state b
0
with b
X,0
(x) = 1/N and b
Y,0
(y) =
y,y
0
. The belief state has
the Markov property: after taking an action and making an observation, the new
belief state is conditionally independent of the past, given the previous belief state,
b
k
(s) = Prs[z
1:k
, a
1:k
, b
0
= Prs[z
k
, a
k
, b
k1
. (3.10)
The a posteriori belief state can therefore be updated recursively by applying
Bayes rule,
b
k
(s) = O(s, a
k
, z
k
)

S
T(, a
k
, s)b
k1
() (3.11)
where the normalization factor is given by

1
=

sS
O(s, a
k
, z
k
)

S
T(, a
k
, s)b
k1
() (3.12)
We use the following shorthand notation for the belief state update:
b
k
= B(b
k1
, a
k
, z
k
) (3.13)
42
Any POMDP can be represented graphically as a dynamic Bayesian network (Fig-
ure 3.1) which is a model of the relevant variables and their relationships.
Given a stationary policy, i.e. a mapping : BA
y
from the space B of
belief states to actions in the set of admissible actions, which governs the action
selection of the searcher, the expected innite-horizon, undiscounted cost is
J

(b
k
) =E
_

n=1
J(a
k+n
)

b
k
_
(3.14)
where a
k+n
= (b
k+n1
) and the expectation is with respect to the b
k+n
for n 1.
It has been shown in [92] for a path-constrained search problem, that the innite-
horizon, undiscounted cost is nite under suitable assumptions. To make this prob-
lem tractable for rollout, we introduce the terminal cost
J
0
(b, a) =
_
_
_
0, if a = D
T
h
(b), otherwise
(3.15)
where T
h
(b) is a heuristic cost which approximates the expected search time at the
rollout horizon under relaxed assumptions. The heuristic cost will be derived in
Section 3.4, resulting in the following approximate cost-to-go
J

H
(b
k
) =E
_
J
0
(b
k+H
, a
k+H
) +
H

n=1
J(a
k+n
)

b
k
_
. (3.16)
It is convenient to dene the Q-function [110], which quanties the cost of taking
the action a in belief state b, and selecting actions according to policy thereafter,
Q

(b, a) = J(a) +E
b
/
_
J

H1
(b
/
)[ b, a

(3.17)
where b
/
= B(b, a, z) (3.13), i.e. b
/
is the belief state conditioned on the next action
a and observation z.
43
3.3 Approximate Online Solution of POMDP by Rollout
3.3.1 Rollout Algorithm
In a rollout algorithm, the expectation in (3.17) is approximated as a sample aver-
age of the cost, obtained through Monte Carlo simulation of a xed number, W, of
system trajectories (Algorithm 1). A trajectory starts in sample state s b
k
, takes
observation z
k+1
Pr(Z = z[s, a
k+1
) and selects the next action a
k+1
according to
the base policy . Observation and action selection are repeated up to the rollout
horizon H, with time step n incurring the cost J(a
k+n
). At the horizon, the ter-
minal, heuristic cost J
0
(b
k+H
, a
k+H
) is evaluated and added. The effective rollout
policy
/
is then obtained (implicitly) by selecting the action which minimizes the
average cost,
a
k+1
=
/
(b
k
) = argmin
A
y
k
D
Q

(b
k
, ). (3.18)
It is guaranteed that the rollout policy
/
is an improvement over the base policy,
in the sense that for the cost, we have J

/
H
(b) J

H
(b). Rollout is effectively a
single-step policy iteration [48]. It should be noted however, that an improvement
in a heuristic cost does not generally imply an improvement in the true, innite-
horizon, undiscounted cost. The base policy is typically chosen such that the
action selection is computationally very simple. In this chapter, we adopt a number
of different base policies and evaluate their search time performance by simulation.
3.3.2 Parallel Rollout
A generalization of the rollout approach, which uses multiple base policies
from a predened set of policies, , and is referred to as parallel rollout, was
introduced in [16]. In parallel rollout, the Q-function is computed as
Q

(b, a) = J(a) +E
b
/
_
min

H1
(b
/
)[ b, a

(3.19)
Parallel rollout is based on the intuition that multiple base policies allow a more
thorough exploration of the reachable belief space than a single base policy alone.
It is shown in [16] that the nite-horizon cost for parallel rollout is no higher than
44
the lowest cost achievable by any of the component policies . However,
with a heuristic cost, this guarantee does not imply an improvement of the actual
performance, and simulation must be used to evaluate the search time for different
policy sets .
3.4 Heuristics for the Expected Search Time
Heuristics can be used to approximate the expected cost-to-go of a POMDP. The
use of a heuristic is appropriate when the computation of the exact Q-function
is impractical, but an approximation of the expected cost-to-go can be obtained
by making reasonable, simplifying assumptions about the future behavior of the
system under consideration [17]. Heuristics can be either generic or application-
specic. As an example of a generic heuristic, the QMDP [67] is based on the
assumption that the state of the system becomes observable after the next time
step, thus reducing the problem to the solution of an MDP. The performance re-
sulting from the application of such a generic heuristic can vary drastically with
the specic application. The quality of a heuristic for a given problem can be
improved, when its design incorporates expert knowledge about the application
domain. Any performance gain due to the improved heuristic, however, comes at
the expense of general applicability.
In this section, we derive heuristics for the terminal cost at the rollout horizon,
which approximate the remaining expected search time for the mobile searcher.
The heuristics themselves are based on solutions of relaxed, optimal search prob-
lems.
A rst observation about the future behavior of the system, prevalent especially
at the later stages of the search, is that the hop count information available to the
searcher tends to get exhausted, as each sensor node conveys at most a single hop
count observation. In this section we will therefore assume, that no useful informa-
tion in the form of hop count observations z >0 can be obtained beyond the rollout
horizon. Furthermore, we propose that the searcher action constraints in Denition
3.2 (collectively referred to as search path constraint) can be relaxed when the cells
of high probability of containing the target are spatially concentrated during the
later stages of the search, thus requiring the searcher to travel only short distances.
45
3.4.1 Constrained Search Path
In this section, we consider a search time heuristic based on the assumption that
the searcher will not be able to acquire useful hop count information (other than
detecting the target, dened as z = 0) beyond the rollout horizon. This assumption
leads to the requirement of solving an optimal, path-constrained search problem.
The path-constrained search is a computationally challenging problem, which has
been shown to be of NP-complete complexity [111].
A number of algorithms for the solution of optimal, path-constrained search
problems, involving both static and moving targets, have been proposed and an-
alyzed by various authors. These include dynamic programming to compute an
ofine solution of a POMDP [27, 92], as well as branch-and-bound techniques
to solve linear or non-linear integer program formulations of the search problem
[28, 38, 68, 86]. In most of these approaches, the optimization goal is chosen to
be the maximization of the probability of detecting the target, subject to a cost
constraint (such as a nite search time horizon). The objective in our application
is to minimize the expected search time. We show in Appendix C how to formu-
late this problem as an integer linear program and compare its performance to the
corresponding integer linear program for the maximum probability of detection.
Unfortunately, the computational complexity of the path-constrained search
problem prevents its use as a heuristic to approximate the future expected search
time in a Monte Carlo based rollout algorithm. In the following section, we con-
sider the relaxation of the path constraint, resulting in a heuristic which yields to
an efcient solution approach.
3.4.2 Relaxation of Search Path Constraint
As the search progresses and more hop count observations are gathered, we gen-
erally nd that the target pmf tends to concentrate around the most likely target
position. At this stage of the search, the searcher will be traveling shorter distances
and dwell longer in cells with high probability of containing the target, so that the
path constraint becomes less imposing.
These considerations motivate the construction of a heuristic through relax-
ation of the search path constraint which limits the searcher to stay in the current
46
cell or move to an immediate neighbor cell. We will further assume that no useful
information in the form of hop count observations z > 0 can be obtained beyond
the rollout horizon.
At the rollout horizon, let the target location be described by the belief state
b
X
. Recall that the detection probability per look is q, given that the target node is
in the visited cell. We derive the heuristic by starting with a discrete search plan
not subject to a path constraint [103]. Let l(x, 1) denote the number of looks
that have been placed in cell x so far, out of a total of 1 looks. The detection
function for the -th look, conditioned on cell x containing the target, is given by
the geometric distribution
(x, ) = q(1q)
l(x,1)
. (3.20)
The probability of failing to detect the target in 1 looks and succeeding on the
-th look, placed in cell x, is therefore
b
X
(x)(x, ). (3.21)
Under the assumption that each look incurs the same cost of one search time incre-
ment, the optimal search plan, that is, the sequence of looks which minimizes the
expected search time, is shown in [103] to be

= argmax
xX
b
X
(x)(x, ) (3.22)
that is, the next look is always placed in the cell with the highest probability of
detecting the target (with ties broken arbitrarily). The search commences in the cell
of highest prior target probability b
X
(x) and, if the target is not detected, continues
to look in this cell until for some x
/
X,
b
X
(x
/
)(x
/
, ) b
X
(x)(x, ). (3.23)
The search then expands by allocating search effort to both cells x and x
/
. The
47
expected search time is
T
h
(b) =

=1
b
X
(

)(

, ) (3.24)
Computing this expectation, however, involves the optimal discrete search plan

(an innite sequence) which can only be evaluated recursively. We show that,
if the assumption of searching by discrete looks is relaxed, the expected search
time can be computed in explicit form. In order to allow the search effort to be
applied continuously rather than in discrete increments, the search parameter u,
also referred to as search intensity, is introduced in [53] and dened by
q = 1exp(1/u), (3.25)
where q is the detection probability per discrete look. Let b
X
(x), x X be sorted
in decreasing order, such that b
X
(x
1
) b
X
(x
N
), and dene b
X
(x
N+1
) = 0.
The optimal continuous time search [103] starts in cell x
1
at t
1
= 0, and generally
expands from n to n+1 cells after a dwell time of t
n
=t
n+1
t
n
, given by
b
X
(x
n+1
) = b
X
(x
n
)exp
_

t
n
nu
_
, (3.26)
where the search effort parameter nu reects the expanding search which involves
more and more cells. It is shown in [53], that the optimal search is equivalent to
the greedy maximization of the entropy of the target pmf.
Since the b
X
(x) are sorted in decreasing order, there exists an m N for which
b
X
(x
m
) > 0 and b
X
(x
n
) = 0 for n > m. The sequence t
n
is given through (3.26) by
the recursion
t
n+1
=t
n
+nulog
_
b
X
(x
n
)
b
X
(x
n+1
)
_
, n = 1, . . . , m. (3.27)
An explicit expression for t
n
is
t
n
= u log
_

n
i=1
b
X
(x
i
)
b
n
X
(x
n
)
_
, n = 1, . . . , m. (3.28)
As the search never expands to involve more than m cells, the nal dwell time
48
interval t
m
is unbounded above, i.e. t
m+1
. The expected search time for this
problem is
T
h
(b) = u
1
m

n=1
_
t
n+1
t
n
b
X
(x
n
)exp
_

t
n
nu
_
d (3.29)
where the factor u
1
normalizes the continuous pdf. With (3.26), the expected
search time with the proposed heuristic can be expressed as
T
h
(b) =
m

n=1
n[b
X
(x
n
)(t
n
+nu) b
X
(x
n+1
)(t
n+1
+nu)] (3.30)
where lim
t
m+1

b
X
(x
m+1
)(t
m+1
+nu) = 0. The cell y, visited by the searcher at the
rollout horizon, does not generally coincide with the cell x
1
of maximum probabil-
ity of containing the target. It can be argued that, if the distribution b
X
is relatively
concentrated around the cell x
1
, the searcher has to travel a number of time steps
equal to
(b) = M|c
y
c
x
1
|
1
(3.31)
before encountering cells of signicant probability of containing the target. This
can be interpreted as a limited form of a search path constraint. The extra cost in
terms of search time is included in the heuristic for the expected search time at the
rollout horizon (3.30),
T
h
(b) = (b) +
m

n=1
n[b
X
(x
n
)(t
n
+nu) b
X
(x
n+1
)(t
n+1
+nu)] . (3.32)
The heuristic search time 3.32 can be computed efciently. Due to the requirement
to sort the m non-zero probability values contained in b
X
, the run time is of the or-
der O(mlogm), that is, the complexity of the best known sort algorithms [62]. The
remaining steps of computing 3.32 can be performed in linear time (or logarithmic
time when fully parallelized). Speed gains of several orders of magnitude over the
integer linear programming approach in Section 3.4.1 are typical.
49
3.5 Information-Driven Target Search
The search objective can be described informally as a reduction of the initial uncer-
tainty about the target location. Consequently, algorithms that maximize a suitable
measure of information gain between the a priori and a posteriori belief states,
though suboptimal, are readily applicable to many search problems. We consider
here the maximization of the mutual information, which is a special case of the
more general Rnyi divergence [17]. Many information-driven algorithms have
been proposed, although for some variants [74, 112] it can be shown, as in Section
3.5.2, that they reduce to the basic form of maximization of the mutual information.
Differences between algorithms relate more often to the belief state representation,
for example as grid-based, mixture or particle lters, and the approximations used
to compute the information gain [43, 78, 87, 93, 94].
Due to its widespread application, we use information-driven search as a base-
line to evaluate the search-time performance of the proposed, heuristic rollout al-
gorithms against.
3.5.1 Mutual Information Utility
Information-driven target search algorithms can be classied broadly into methods
based on a myopic [31, 4345, 93, 110] or non-myopic [87, 94] mutual informa-
tion utility. It is characteristic for all of these methods to select the action which
maximizes the mutual information between the current state S
k
b
k
and the future
observations Z
k+n

n=1,...,H
, that is
a
k+1
= argmax
aA
y
k
I(X
k
; Z
k+1:k+H
[a). (3.33)
The mutual information is dened [19] as the difference between the entropy of
the state, and the conditional entropy of the state given the observations (omitting
time index k)
I(S; Z
1:H
[a) = H(S) E
Z
1:H
[H(S[z
1:H
, a)]. (3.34)
50
In the myopic case, i.e. for H = 1, the mutual information can be computed exactly
based on its denition,
I(S; Z[a) = H(S)

zZ
PrZ = z[aH(S[z, a). (3.35)
For H > 1 however, computing the expectation in (3.34) quickly becomes imprac-
tical, as the width of the POMDP policy tree, ([A[[Z[)
H
, grows exponentially in
the horizon. By applying Monte Carlo techniques, the expectation in (3.34) is re-
placed by a sample average. Therefore, the rollout algorithm (Algorithm 1) lends
itself to computing a sample average of the information gains of a number, W, of
system trajectories. As in Section 3.2.2, a trajectory starts in a sample state s b,
takes an observation z Pr(Z = z[s, a) and selects the next action according to the
base policy . The sample average of the information gain H(X
k
) H(X
k
[z
1:H
)
over W trajectories is then interpreted as an approximation of the non-myopic mu-
tual information under base policy (Algorithm 3). The rollout policy is obtained
implicitly by selecting the action which maximizes the approximate mutual infor-
mation utility,
a
k+1
= argmax
aA
y
k
I

(X
k
; Z
k+1:k+H
[a). (3.36)
3.5.2 Infotaxis and Mutual Information
A generalized information-driven search strategy referred to as infotaxis was
proposed in [112], in which the optimal seacher behavior is characterized as a
tradeoff between the need to perform exploitative and exploratory, information-
gathering actions, motivated by concepts from reinforcement learning [106]. This
tradeoff is reected in the infotactic utility function used in [112] and [74], which
is a weighted sum of terms favouring either greedy or exploratory behaviour.
It can be shown that the infotactic utility can be reduced to the myopic mutual
information (3.34) with H = 1. A proof for this equivalence is given in Appendix
D. Infotaxis offers an interesting viewpoint by showing explicitly, that maximiz-
ing the mutual information inherently balances exploitation and exploration in an
optimal way, when the overall goal is the entropy reduction of the belief state.
51
3.6 A Lower Bound on Search Time for Multiple
Searchers
It has been demonstrated, that multiple cooperating searchers can achieve a much
improved search time performance [32, 36, 46, 49, 73, 89]. This improvement
can be interpreted as a form of diversity gain. Other factors being equal (such
as the observation model), the search time therefore depends to a large extent on
the initial, spatial distribution of the searchers and the degree of their cooperation.
Ideally, the distributed searchers share the same information, or belief state, which
is formally equivalent to a centralized approach. In more realistic models however,
communication constraints limit the amount of, or the rate at which information
may be shared between searchers.
Since deploying and operating the searchers is expensive, it is of signicant
interest to characterize mean search times as a function of the number of searchers.
A trivial lower bound on the search time is the time for the searcher closest to the
target to reach it, provided that the target position is known. Assuming that the
searchers are initially distributed uniformly at random in the plane, and that the
time for multiple cooperating searchers to obtain an unambiguous x on the target
is small compared to the time required to reach it, we obtain a theoretical lower
bound on the search time as a function of the number of searchers per unit area. In
our application, this is a reasonable assumption in light of the results in Chapter 2
which show that generally, only a few observations from random positions sufce
to achieve good target localization.
We assume furthermore, that all searchers travel at the same, constant speed v,
so that the searcher closest to the observed target will reach it rst.
Under these assumptions, the mean search time can be expressed in terms of
the contact distance in a spatial Poisson point process P

representing the initial


positions of the searchers. The contact distance is a well-dened random variable,
given by the shortest distance from the (xed) target x to the closest searcher and
denoted d(x, P

) [4]. The distribution function of the contact distance is


Prd(x, P

) < r = 1exp(r
2
) (3.37)
52
and its mean is
E[d(x, P

)] =
1
2
_
1

. (3.38)
With the mean contact distance, we obtain the lower bound on the mean search
time as a function of the searcher density per unit area, , as
E[T
search
]
1
2v
_
1

. (3.39)
This lower bound offers the insight that in order to reduce the expected search time
by a factor of a through the deployment of multiple searchers, the density of the
searchers must be increased by at least a
2
.
3.7 Simulation Results
3.7.1 Simulation Setup
In the simulations, policy rollout [8] is used to approximately solve the online
POMDP formulation of the search problem in Denition 3.2, which uses message
hop count observations to estimate the target location [10]. The proposed rollout
algorithm approximates the innite horizon, undiscounted cost at the rollout hori-
zon by heuristic (3.32).
We compare the search times obtained by the rollout algorithm to approaches
which use myopic or non-myopic mutual information utility to control the searcher,
similar to [43, 74, 87, 94, 112]. All simulations share the same message hop count
observation model [10] to estimate the target location. We furthermore evaluate the
performance of some extensions of the basic rollout approach (Algorithm 1), such
as different base policies, as well as parallel rollout [16] of multiple base policies.
The WSN deployment area is assumed to be the unit square. The search space
is dened by dividing the deployment area into a square grid of N search cells.
The size of a grid cell thus limits the accuracy to which the target position can be
determined. On the other hand, the cost of a single update of the a posteriori target
probability distribution is proportional to the size of the search space, N, which
is incurred by the rollout algorithm [A[HW times per time step. As the searcher
53
requires a number of time steps proportional to

N to cover a given distance, the
total simulation time increases as N
3/2
(ignoring other overhead). A compromise
between target localization accuracy and simulation time on a contemporary multi-
core processor has been determined to be N = 3232. Sensor node locations
are restricted to the discrete cell coordinates, an appropriate assumption given the
discrete nature of the search problem. A cell is occupied by a sensor node with
probability p = 0.75. The cell at the center of the deployment area is declared as
the source of a broadcast message and the hop counts assigned to each sensor node
are generated by the models M1, M2 or M3 dened in Section 3.2.1. The searchers
initial position is random, but at a xed Euclidean distance of d = 10/

N away
from the target.
To make the search time performance of different algorithms or base policies
comparable, the simulations are designed to use the same set of hop count realiza-
tions (generated by either one of the models M1 to M3) and the same set of initial
states for every case.
The searchers hop count observation model is a translated Poisson distribution
(3.2), given the distance r from the hypothesized target and the communication
radius R. The parameters
1
,
2
governing the mean hop count of (3.2) are deter-
mined off-line by maximum likelihood estimation as in Chapter 2, based on the
chosen node occupation probability p = 0.75 and a network mean node degree of
= 40.
The mobile searcher has a success probability q = 0.75 of making a hop count
observation, given that the visited cell is occupied by a sensor node. The searcher
can either stay in the cell currently visited, or move to one of its immediate neigh-
bor cells, subject to search space boundary constraints.
The rollout parameters used in the simulation are: the rollout horizon is set
to H = 4 (except for one test case which uses H = 2), the number of trajectories
sampled per action is W = 200. The base policies used by the rollout algorithm
are:
random: selects an action a A
y
k+n
from the current action set uniformly at
random.
constant: repeatedly executes the initial action a A
y
k
.
54
greedy: selects the action which maximizes the a posteriori probability b
X
of
detecting the target at the simulated searcher position after the next
action.
3.7.2 Idealized Observations
To evaluate and compare the performance of the rollout and information-driven al-
gorithms (Figures 3.2 to 3.6), we apply the generative hop count model M3 dened
in Section 3.2.1. Recall that in this model, the observed hop counts are independent
draws from the translated Poisson distribution, given the distance from the source
node. This represents an idealized situation, as the observation statistics match the
searchers observation model exactly.
Figure 3.2 shows the cdfs of the search time for the rollout algorithm (under
a base policy of random action selection and with heuristic terminal cost (3.32)),
and for a myopic, mutual information-driven algorithm. Rollout achieves a better
average search time, as well as a 100% success rate. With the myopic approach,
the searcher suffers from an inability to revisit cells in which the hop count has
been observed previously, so that no further information gain is possible. This can
prevent the searcher from further approaching the target. As Figure 3.3 shows, with
a non-myopic, mutual information-driven algorithm (3.36), the searcher is able to
reach the target with a 100% success rate, although in terms of the average search
time, it is still outperformed by the rollout algorithm.
Figure 3.4 shows the result of comparing the 3 base policies dened earlier in
this section. Selecting base policies is guided by heuristics and is also to a large ex-
tent dictated by computational complexity considerations in the Monte Carlo sim-
ulation context. We nd that, out of the set of base policies evaluated, the random
action selection yields the best search time performance. This result indicates that
the solution of our problem benets from a relatively high degree of exploration
(by random walk) of the reachable belief space to reduce uncertainty, whereas the
greedy policy grants too much weight to unreliable information about the target
location, embodied by the current belief state.
Figure 3.5 shows the cdfs of the search time for parallel rollout (3.19). We
dene 2 base policy sets, which are combinations of random action selection with
55
constant action selection, and random action selection with greedy action selection,
respectively. Due to the use of a heuristic for the terminal cost, we cannot rely on a
guarantee of performance improvement over the single policy case (the cdf for the
single policy rollout under random action selection is shown for reference). The
parallel rollout of random and constant action selection is seen to have improved
performance over random action selection alone most of the time, but not always.
By reducing the rollout horizon to H =2, any performance differences between the
policies diminish (Figure 3.6). This is expected, considering that for H = 1 (the
myopic case), the choice of base policy becomes altogether immaterial.
3.7.3 Empirical Observations
The search times obtained for the generative model M3 constitute a lower bound on
the search time performance, because M3 generates idealized hop count observa-
tions which are independent and whose distribution matches that of the observation
model (3.2). Another set of simulations is designed to evaluate the performance for
the empirical models M1 and M2, both of which generate observations based on
rst-passage hop counts in WSNs. Observations generated by M1 represent a re-
alistic set of empirical hop counts in a single instance of a WSN, whereas M2
generates observations to be statistically independent, by virtue of being sampled
from independent realizations of WSNs. Figure 3.7 compares the cdfs of the search
time for the 3 generative models. The performance loss experienced under model
M2 is attributable to the approximate nature of the observation model (3.2). The
additional performance loss under the realistic model M1 is explained by the statis-
tical dependence of the empirical observations. This rather signicant effect shows,
that the nave Bayes assumption of independent observations incurs a large perfor-
mance penalty in this application. We discuss aproaches of mitigating observation
dependence in Section 3.8.
The relative performance advantage of rollout over the non-myopic, information-
driven approach (Figure 3.3) is generally preserved under the generative hop count
models M1 and M2, as Figure 3.8 shows.
56
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (randomAction)
mutual information (myopic)
Figure 3.2: CDF of the search time for rollout under random action selec-
tion, compared to the search time for myopic mutual information utility.
Rollout horizon H = 4, hop counts are generated by model M3. Error
bars indicate the 95% condence intervals.
3.8 Statistical Dependence of Observations
In order to gain insight into the statistical dependence between hop count obser-
vations, we compute the local average deviation of the hop count from the model
mean, that is, we estimate E[z (r)]. Local averaging is achieved by Gaussian
kernel smoothing with = 0.1. The average deviation of the hop count from the
model mean over the WSN deployment area [0, 1]
2
is shown in Figure 3.9 for a
source node at the center, and for the generative hop count models M1-M3 dened
in Section 3.2.1, respectively.
It is helpful to picture, somewhat informally, the broadcast message propagat-
ing through the WSN as a tree-like process. Clearly, the hop counts on related
branches are dependent random variables. As a consequence, we nd that locally,
there can be a noticeable average deviation of the empirical hop count (M1) from
the model mean (Figure 4.1a). For independent empirical hop counts as generated
57
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (randomAction)
mutual information (nonmyopic)
Figure 3.3: CDF of the search time for rollout under random action selec-
tion, with horizon H = 4, compared to the search time for non-myopic
mutual information utility. Rollout horizon H = 4, hop counts are gen-
erated by model M3. Error bars indicate the 95% condence intervals.
by model M2, the deviations are much smaller (Figure 4.1b), similar to the inde-
pendent hop counts drawn from the translated Poisson model M3 (Figure 4.1c).
The statistical dependence between hop count observations can be more for-
mally characterized in terms of the correlation coefcient
=
E[(Z
k

k
)(Z
k1

k1
)]

Z
k

Z
k1
(3.40)
where Z
k1
is the hop count observed at distance r from the target, and Z
k
the hop
count at a distance dr away from the rst observation. The correlation coefcient
(r, dr), as a function of the distance r from the target and the inter-observation
distance dr, is shown in Figures 3.10a, 3.10b, 3.10c, for the hop count models M1,
M2 and M3 described in Section 3.2.1, averaged over 8 WSN realizations.
It seems clear that in order to approach the search time performance achieved
by the rollout algorithm under idealized hop count observations M3, the depen-
58
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (randomAction)
rollout (constantAction)
rollout (greedyAction)
Figure 3.4: CDFs of the search time for rollout under 3 different base poli-
cies: random, constant and greedy action selection. Rollout horizon
H = 4, hop counts are generated by model M3. Error bars indicate the
95% condence intervals.
dence between empirical observations must be addressed. The value of model M2
is in establishing a lower bound for the expected search time achievable by any
technique for mitigating the dependence of the empirical observations (under oth-
erwise identical rollout parameters).
We discuss two approaches to reduce the impact of observation dependence.
It may be possible to achieve acceptable search-time performance by effectively
avoiding the use of (strongly) dependent observations. Failing this approach, the
observation dependence must be taken into account explicitly in the observation
model.
3.8.1 Mitigation of Observation Dependence
The dependence of the empirical hop count observations (M1) can be reduced in
the following manner: rather than relying on a single broadcast, the source node
launches a sequence of broadcasts, each one effectively inducing a new, indepen-
59
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (randomAction)
parallel rollout (random, constant)
parallel rollout (random, greedy)
Figure 3.5: CDF of the search time for rollout under random action, com-
pared to 2 parallel rollout approaches: random and constant action se-
lection, random and greedy action selection. Rollout horizon H = 4,
hop counts are generated by model M3. Error bars indicate the 95%
condence intervals.
dent realization of the random node-to-node delays. When interrogated by the
searcher, a node reports the hop count of one (e.g. selected at random) of the
broadcast messages received. Unfortunately, this approach is not energy efcient.
Figure 3.10a suggests that the correlation between successive observations of
the empirical hop count decreases with the distance between two observations. We
may increase the average inter-observation distance by reducing the observation
probability q. Unfortunately, this also reduces the number of hop count observa-
tions available to the searcher in a given time, and contributes to longer search
times. A reduction of the number of observations made by a single searcher, how-
ever, can be offset by deploying multiple cooperating searchers, which can increase
the spatial diversity of the hop count observations, thereby further reducing depen-
dence between observations.
In our application, it is also possible to explicitly determine the degree, in terms
60
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (randomAction)
parallel rollout (random, constant)
parallel rollout (random, greedy)
Figure 3.6: CDF of the search time for rollout under random action, com-
pared to 2 parallel rollout approaches: random and constant action se-
lection, random and greedy action selection. Rollous horizon H = 2,
hop counts are generated by model M3. Error bars indicate the 95%
condence intervals.
of graph distances, to which observations are related. This approach requires, that
each message does not only convey the ID of its source node, but the entire history
of IDs of nodes visited. It is then easy for the searcher to determine the most recent,
common ancestor node for a given pair of observations, which is indicative of their
degree of dependence. The searcher can then decide how to make use of a new
observation, if at all.
3.8.2 Explicit Model of Observation Dependence
In order to explicitly take observation dependence into account, we propose to aug-
ment the hop count observation model (3.2) by computing the conditional proba-
bility distribution of the current hop count, f
Z
(z
k
[z
k1
; r
k
, r
k1
), given the previous
observation and based on the correlation coefcient. The correlation coefcient is
assumed to be known and expressed as a function of the distance from the target
61
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (idealized observations)
rollout (empirical, independent)
rollout (empirical)
Figure 3.7: CDFs of the search time for rollout under random action, with
horizon H = 4, for the 3 hop count generation models M1, M2 and M3.
Error bars indicate the 95% condence intervals.
and the inter-observation distance, as in Figure 3.10.
Models for stationary processes of count data have been developed by various
authors [11, 55, 114]. We consider the discrete-valued, auto-regressive INAR(1)
model with support restricted to the non-negative integers [1], as an analogue to
the linear, auto-regressive AR(1) model. We adapt the INAR(1) model to describe
the hop count observation at time k by letting
Z
k

k
= (Z
k1

k1
) +(1) (V
k

k
) (3.41)
where Z
k
and Z
k1
represent the current and the previous hop counts,
k
and
k1
are the minimum hop counts (3.2) due to the nite transmission radius R, and
=
k

k1
. The variable V
k
is referred to as the innovation. The correlation
coefcient is assumed to be non-negative. The binomial thinning operator for
discrete random variables, denoted , can be interpreted as an analogue to the mul-
tiplication by a scalar a [0, 1]. Binomial thinning of a discrete random variable Z
62
0 20 40 60 80 100 120
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
search time
C
D
F
G rollout (idealized observations)
rollout (empirical, indep.)
rollout (empirical)
nonmyopic MI (idealized observations)
nonmyopic MI (empirical, indep.)
nonmyopic MI (empirical)
Figure 3.8: CDFs of the search time for rollout under random action, with
horizon H = 4 and for the 3 hop count generation models M1, M2 and
M3. For comparison, the corresponding results for the non-myopic
mutual information utility are shown to demonstrate that the perfor-
mance advantage of our search time heuristic over the non-myopic,
information-driven approach is generally preserved, regardless of the
choice of generative hop count model. Error bars indicate the 95% con-
dence intervals.
is dened in [101] by
aZ
Z

i=1
B
i
(3.42)
where the B
i
are i.i.d. Bernoulli random variables, with PrB
i
= 1 = a and
PrB
i
= 0 = 1 a. If the innovation V
k
in (3.41) is an i.i.d. translated Poisson
variable with parameter , and provided that
k
=
k1
, it can be shown that the
stationary distribution of Z
k
is also translated Poisson with parameter . This fol-
lows from the stationary behavior of the INAR(1) model for Poisson innovations
63
[55]. Given the hop count observation Z
k1
= z
k1
, the conditional probability dis-
tribution f
Z
(z
k
[z
k1
; r
k
, r
k1
) is the convolution of a binomial distribution with a
Poisson distribution [55], that is
f
Z
(z
k
[z
k1
; , ) =
m

n=0
_
z
k1

n
_

n
(1)
z
k1
n
e

1
_

1
_
z
k

(z
k
n)!
(3.43)
where m = min(z
k
, z
k1
) and , are the parameters of the translated
Poisson innovation V
k
, with (r
k
) = + and =,r
k
/R| as dened in (3.2).
3.9 Conclusion
We have proposed rollout algorithms to approximately solve a path-constrained
search problem, involving an autonomous searcher observing message hop counts
in a WSN, formulated as an online POMDP. The rollout approach uses a novel
heuristic for the terminal cost-to-go, making the approach computationally tractable.
It was shown, by simulation, that the algorithms outperform both myopic and non-
myopic mutual information-driven search approaches. A comparison of the search
performance under different base policies and parallel rollout shows the poten-
tial for further performance improvement, though likely to be gained at additional
computational cost.
We have also evaluated the search performance under different generative mod-
els for the hop count, varying from idealized to more realistic, to quantify the per-
formance loss due to statistically dependent observations. We have discussed meth-
ods of reducing dependence, and adapted an integer autoregressive process model
to account for the dependence explicitly. Future research will focus on models of
the observation dependence and their performance evaluation.
64
2
1.5
1
0.5
0.5
0.5

0
.
5

0
0
0

0
0.5
0.5
1
1
1

1.5
1.5
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(a) Empirical (M1)
0
0
0
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(b) Empirical, independent (M2)
0
0
0
0
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(c) Idealized (M3)
Figure 3.9: Deviation of the local average hop count from the model mean
over the unit square [0, 1]
2
, with source node at center.
65
distance from source node
i
n
t
e
r

o
b
s
e
r
v
a
t
i
o
n

d
i
s
t
a
n
c
e
0
.1


0
.
2

0.3
0.4
0
.5

0.6
0.7
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
0
.
1
0
0
.
1
2
(a) Empirical (M1)
distance from source node
i
n
t
e
r

o
b
s
e
r
v
a
t
i
o
n

d
i
s
t
a
n
c
e
0

0
0
0


0

0.1
0.2
0.3
0.4
0.6
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
0
.
1
0
0
.
1
2
(b) Empirical, independent (M2)
distance from source node
i
n
t
e
r

o
b
s
e
r
v
a
t
i
o
n

d
i
s
t
a
n
c
e
0

0
0.1
0.2
0.3
0.6
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
0
.
1
0
0
.
1
2
(c) Idealized (M3)
Figure 3.10: Correlation between observations for different generative hop
count models, as a function of the distance from the source node and
the inter-observation distance.
66
Chapter 4
WSN-assisted Autonomous
Mapping
4.1 Introduction
A mobile, autonomous platform is assisted by a wireless sensor network in its task
of inferring a map of the spatial distribution of a physical quantity that is mea-
sured by the sensor nodes. Sensor nodes initiate a broadcast in the network, when
the measured quantity assumes a value in the range of interest. Specically, we
consider randomly deployed networks of location-agnostic wireless sensor nodes,
which broadcast messages by ooding. The node-to-node delays are assumed to
be random. In networks of this type, the hop count of a broadcast message, given
the distance from the source node, can be approximated by a simple parametric
distribution. The mobile platform can interrogate a nearby sensor node to obtain,
with a given success probability, the hop counts of the broadcast messages origi-
nating from different source nodes. By fusing the information from successive hop
count observations, the mobile platform infers the locations of the source nodes
and thereby, the spatial distribution of the quantity of interest. The path taken by
the mobile platform should minimize the resulting mapping error as quickly as
possible. We propose an information-driven path planning approach, in which the
mobile platform acts by maximizing a weighted sum of myopic, mutual informa-
tion gains. We show by simulation, that a suitable adaptation of the weights is
67
effective at reducing the error between the true and the inferred map, by preventing
the information gain to be dominated by only a few source nodes.
4.1.1 Background and Motivation
Exploration and mapping of unknown environments is a fundamental application
of autonomous robotics [109, 110]. In its most general form, referred to as the
simultaneous localization and mapping (SLAM) problem [3, 25], neither a map of
the environment nor the location of the autonomous sensor platform within it, are
known a priori and must be inferred from observations.
Much research has been devoted to the problem of generating maps of physi-
cal objects or obstacles in those environments by autonomous platforms typically
equipped with sensor technologies such as laser or ultrasound range nders, imag-
ing sensors etc. [88, 98]. In order to enhance the situational awareness of an
autonomous platform even further, we may be faced with the task of determining
the spatial distribution of a physical quantity, for example the concentration of a
chemical, which cannot observed froma distance, but whose measurement requires
direct physical contact with the sensor. In this case, it would be rather impractical
to use an autonomous sensor platform to obtain a dense map of the quantity of
interest. Instead, wireless sensor networks (WSNs), due to their pervasive sensing
ability, seem to be a more appropriate solution to this task. In principle, messages
can convey sensor readings and corresponding node locations (provided the nodes
are location-aware) via multiple hops through the WSN and to a data sink, where
a spatial map of the observed quantity is then easily generated. In a surveillance
application described in [21], sensor nodes are assumed to establish their own po-
sitions by GPS or other localization methods, which are subsequently used in the
reporting of events of interest to an unmanned aerial vehicle (UAV). However,
inherent cost and complexity limitations of most practical WSNs can make an ap-
proach requiring full node location awareness and routing capability difcult and
expensive to implement. When deployed in unknown and possibly inhospitable
environments, it is reasonable to assume that the sensor node placement is random
(with a known average density), and that the node locations are not known a priori.
The requirement to establish accurate sensor node locations and multi-hop routes
68
to the data sink adds considerable complexity to the hardware and software, which
often conicts with the sensor node energy and cost budgets, especially those im-
posed by very simple, low-cost and low-power devices.
We consider an approach in which a mobile data sink (or mapper) operates
in the area to be mapped and in which a WSN has been deployed, by placing sen-
sor nodes uniformly at random to take measurements of a quantity of interest. We
assume that due to cost and operational constraints, we are limited to use location-
agnostic sensor nodes. The mapper can interrogate nearby sensor nodes directly to
obtain message hop count information, based on which the source locations associ-
ated with the received messages are estimated. The hop count is statistically related
to the distance between the source node and the interrogated node. The mobile plat-
form, which is assumed to be location-aware, infers a spatial map of the quantity of
interest based on the hop count information. Messages are disseminated from the
source nodes through ooding with random node-to-node transmission delays, in
a process known as rst-passage percolation [15]. We assume furthermore, that in-
terference between message transmissions can be neglected, see also Section 2.2.2.
For such a network, it was shown in Chapter 2 that the message hop count, condi-
tioned on the distance from the source node, can be approximated by the marginal
distribution of a simple stochastic process.
This chapter is motivated by the problem of an autonomous mapper which
is given the task of estimating the spatial distribution of a physical quantity, by
relying solely on observations of message hop counts in WSNs of the type studied
in Chapter 2. For simplicity, suppose that we are only interested in obtaining a
binary map, indicating the sensor node locations at which the measured quantity
has a value exceeding a threshold. Sensor nodes measuring sub-threshold values
do not initiate a message broadcast, which reduces network trafc. The problem
for the mapper can be described as nding an path, along which a maximum of
information is gathered to infer the map.
We focus on determining a path along which the autonomous mapper interro-
gates the wireless sensor nodes to acquire message hop count observations asso-
ciated with each map element. The objective is to reduce the error between the
inferred and the true map as much as possible, over a nite time horizon.
69
4.1.2 Chapter Contribution
We propose a path planning method based on a parametric, myopic information-
theoretic utility, which is dened as a weighted sum of the mutual information
between the random variables describing the source node locations and future hop
count observations. The weights are adapted using heuristics to offset the myopic
nature of the utility, in order to improve mapping performance relative to a plain
mutual information utility, but without the expense of true non-myopic path plan-
ning.
4.1.3 Chapter Organization
The chapter is organized as follows. Our system model, which includes a wireless
sensor network component and an autonomous mapper component, is presented
in Section 4.2. The wireless sensor network model is largely identical to the one
described in Section 3.2.1. The mapping path planning algorithm is developed in
Section 4.3. We present simulation results in Section 4.4 and draw conclusions in
Section 4.5. To simplify notation of probabilities, we omit the names of random
variables when this is unambiguous.
4.2 System Model
4.2.1 Wireless Sensor Network Model
We start by recalling the main characteristics of the WSN model used in Section
3.2.1 for target search. The WSN deployment area is assumed to be the unit square
[0, 1]
2
R
2
. Sensor nodes are distributed according to a spatial Poisson point
process of density , restricted to the deployment area. The nodes have a com-
munication range of R, thus inducing a random geometric network graph G

with
mean node degree = R
2
. We assume that the design parameters , R are cho-
sen such that the mean node degree exceeds a critical threshold, above which the
network is connected with high probability.
For the simulation of an autonomous mapper making hop count observations,
it is convenient to discretize both the mapper position and the sensor node coordi-
nates. Rather than placing sensor nodes randomly in [0, 1]
2
according to a Poisson
70
point process, we restrict possible node locations to a nite set of N = L
2
grid co-
ordinates, C =0, 1/L, . . . , (L1)/L
2
, indexed by the set X =1, . . . N. We
refer to X as the set of cells. A cell is occupied by a sensor node with probabil-
ity p. The resulting random node placement over C approximates a Poisson point
process with density = pN restricted to [0, 1]
2
.
The true map M to be inferred by the mapper is dened as the subset of sensor
nodes in X, who initiate a broadcast in response to measuring a value in excess of
a predened threshold,
M =x X : x is source of a broadcast. (4.1)
The sensor node positions are described by the set of random variables M
(i)
with
realizations m
(i)
, where i = 1. . . [M[.
Sensor nodes are assigned unique IDs, which are used to tag messages by the
node which initiates a broadcast. Every node forwards the messages to all of its
neighbors within the communication range; however, duplicates of a message with
the same ID are discarded in order to avoid unnecessary retransmissions. Each
node-to-node link has a transmission delay associated with it. We assume that the
node-to-node delays are independent, exponentially distributed random variables
with a common mean. This is a reasonable assumption for networks where com-
munication is unreliable and retransmissions are required, or where sensor nodes
harvest energy from unreliable environmental sources and may be dormant for ran-
dom periods. In such a network, the observed hop count at node x
/
is the hop count
of the rst-passage path, that is the path

= argmin
(x,x
/
)
T
d
() (4.2)
which minimizes the message transmission delay T
d
() for all paths (x, x
/
)
connecting nodes x and x
/
. In Chapter 2, it was established that for networks with
strongly supercritical mean node degree , the hop count at distance r from the
target can be approximated by a translated Poisson distribution,
f
Z
(z; r) = Pois(z (r); (r)) z (r) (4.3)
71
where (r) = (r) (r), with a mean hop count given by (r) =
1
r +
2
and
minimum hop count (r) =,r/R|. The parameters
1
,
2
may be inferred using
maximum likelihood estimation as described in Section 2.3.3. It is worth noting
that
1
,
2
are invariant to the choice of the common mean node-to-node delay,
which may depend on operating conditions such as the average available energy.
In other words,
1
,
2
only depend on network design parameters and can therefore
be estimated off-line.
For the simulation of map inference in this chapter, we apply the idealized
generative hop count model M3, as dened in Section 3.2.1.
4.2.2 Autonomous Mapper
The autonomous mapper acts by moving, at discrete times, among the cells dened
by the grid coordinates C =c
x

xX
. At every time step k, the mapper moves
to one of the four neighbor cells, subject to mapping area boundary conditions.
We assume that the mapper position, given by the variable y X, is completely
determined by the action. Note that in many mapping applications, this assumption
is relaxed to account for uncertainty in the mapper position [110].
If the visited cell is occupied by a sensor node, a set-valued observation of
message hop counts is obtained with probability q. Otherwise, the mapper receives
no response at all. A single observation z =
_
z
(1)
, . . . , z
([M[)
_
is a set of message
hop counts associated with all source nodes in the set M dening the true map.
Formally, we dene the set of observations as Z =N
0
, where N
0
are hop
counts and denotes the observation of no response. The mappers observation
model Prz
(i)
[m
(i)
, y is dened by
Pr0[m
(i)
, y =
_
_
_
q, if m
(i)
= y
0, otherwise
(4.4)
Pr[m, y =
_
_
_
1q, if m = y
1qp, otherwise
(4.5)
Prn[m, y =
_
_
_
0, if m
(i)
= y
qpf
Z
(n; r(m, y)), otherwise
(4.6)
72
for n > 0. Here, f
Z
(n; r(m
(i)
, y)) is the translated Poisson model (4.3), with the
Euclidean distance r(m
(i)
, y) = |c
m
(i) c
y
|
2
between the map element at position
m
(i)
and the mapper at position y.
A signicant advantage of the proposed mapping method is that, owing to the
unique message IDs, the feature association problem inherent in many mapping
applications [110, 116], is sidestepped: there is never ambiguity about the associa-
tion of a hop count observation z
(i)
with the originating map element.
Inference of the map M from hop count observations requires the computation
of a belief state over the space of maps, that is, an a posteriori probability distri-
bution, conditioned on the history of mapper actions and observations. The belief
state has the Markov property: after taking an action and making an observation,
the new belief state is conditionally independent of the past, given the previous
belief state,
b
k
(m) = Prm[z
1:k
, y
1:k
, b
0
= Prm[z
k
, y
k
, b
k1
. (4.7)
It is generally impractical to compute the joint a posteriori probability distribution
due to the combinatorial size of the hypothesis space: the number of binary map
hypotheses for our problem is
_
N
[M[
_
(4.8)
However, by assuming that the map elements M
(i)
are pairwise independent ran-
dom variables, we need to compute only the marginal a posteriori probability dis-
tributions, as
Prm[z
k
, y
k
, b
k1
=
[M[

i=1
Pr
_
m
(i)
[z
(i)
k
, y
k
, b
k1
_
(4.9)
With every new set-valued hop count observation, the marginal a posteriori belief
states can be updated recursively by applying Bayes rule,
b
k
_
m
(i)
_
= Pr
_
z
(i)
k
[m
(i)
, y
k
_
b
k1
_
m
(i)
_
(4.10)
73
where the normalization factor is given by

1
=

m
(i)
X
Pr
_
z
(i)
k
[m
(i)
, y
k
_
b
k1
_
m
(i)
_
(4.11)
4.3 Mapping Path Planning
In this section, we formulate a mapping path planning algorithm based on an
information-theoretic utility. Myopic, information-theoretic utilities have been
used for path planning in a variety of target search and tracking problems [43,
110, 112] as well as mapping applications [14, 99].
Assume that at time k, the mapper considers its next action a
k+1
A
y
k
, which
takes the mapper to position y
k+1
. Then, for a given map element M
(i)
, i.e. the
source node of a broadcast, the expected gain in information about its location is
the mutual information [19]
I
_
M
(i)
; Z
(i)
k+1
[a
k+1
_
. (4.12)
The information gain for a map, under the assumed pairwise independence of its
elements, is the sum of the mutual information of the elements. We dene the
mapper utility for taking the action a
k+1
at the next time step as the weighted sum
of the mutual information
U(a
k+1
) =
[M[

i=1
w
(i)
k
I
_
M
(i)
; Z
(i)
k+1
[ a
k+1
_
. (4.13)
The weights w
(i)
k
may be adapted to offset the myopic nature of the utility, without
incurring the computational cost of a true, non-myopic planning algorithm. The
mapper selects the action which maximizes the utility (4.13) at the next time step,
a
k+1
= argmax
A
y
k
U(). (4.14)
In many situations, the mapper is expecting to realize a large information gain at the
next time step, by detecting a nearby map element, which is equivalent to observing
a hop count of zero. The detection of a map element M
(i)
has a high information
74
content as the entropy H(M
(i)
) becomes zero at once, while observations of map
elements further away do not generally promise any large information gains. A
myopic utility function prevents the mapper from taking future information gains
appropriately into account, by excessive focus on imminent detections. As a result,
the mapper may dwell in the neighborhood of some nodes for an extended time
seeking to make detections, even though the location estimates of these nodes may
already be sufciently accurate for the purposes of mapping, and likely to improve
over the remaining horizon.
We propose two heuristics for adapting the weights w
(i)
k
in the utility function
(4.13). The rst heuristic simply excludes map elements from further consideration
in the path planning, for which the minimum hop count observed up to time k,
z
(i)
min,k
= min
n=0...k
z
(i)
n
(4.15)
is smaller than a predened threshold z
0
, by adapting the weights through
w
(i)
k
=
_
_
_
1, if z
(i)
min,k
z
0
0, otherwise.
(4.16)
The second heuristic is motivated by the intuition, that map elements for which
sufcient information has been acquired, have low entropy. Weights are adapted
based on the entropy of corresponding map elements (in addition to the minimum
hop count),
w
(i)
k
=
_

_
H(M
(i)
)
H
max
, if z
(i)
min,k
z
0
0, otherwise,
(4.17)
where H(M
(i)
) is the entropy of map element M
(i)
, and H
max
= logN denotes the
maximum entropy for any map element, that is, the entropy of the a priori uniform
belief state over N cells for the location of a map element.
75
4.4 Simulation Results
4.4.1 Simulation Setup
We have simulated the inference of a binary map M X by an autonomous
mapper, assisted by a wireless sensor network which is represented by a generative
hop count model.
The WSN deployment area is assumed to be the unit square. The mapping grid
is dened by dividing the deployment area into N = 3232 cells. Sensor node
locations are restricted to the discrete cell coordinates, an appropriate assumption
given the discrete nature of the mapping problem. A cell is occupied by a sensor
node with probability p = 0.75.
Let the map represent the spatial distribution of some physical quantity (e.g.
concentration, temperature, presence/absence) measured by the sensor nodes. De-
spite the multitude of possible phenomena we might be interested in mapping, it
is reasonable to assume that most physical processes are spatially correlated, with
i.i.d. processes being rather uncommon.
The hop count is modeled in an idealized fashion as described in Section 3.2.1:
observations are generated as independent draws from a translated Poisson distri-
bution (4.3), given the distance r from the map element associated with the hop
count observation. The model parameters
1
,
2
governing the mean hop count of
(4.3) are determined off-line by maximum likelihood estimation as described in
Chapter 2, given a node occupation probability p = 0.75 and a network mean node
degree of = 40.
The mapper has a success probability q = 0.75 of making a set-valued obser-
vation of hop counts, z =
_
z
(1)
, . . . , z
([M[)
_
, given that the visited cell is occupied
by a sensor node.
The mapper then moves to the immediate neighbor cell which maximizes the
utility (4.14), subject to mapping area boundary constraints.
76
4.4.2 Simulation of Map Inference
The result of the map inference is assessed in terms of two metrics, the average
entropy per inferred map element,

H =
1
[M[
[M[

i=1
H(M
(i)
) (4.18)
and the mean square error between the true map and the inferred map, which we
dene as
mse =
1
[M[
[M[

i=1
(1b(M
(i)
))
2
, (4.19)
where the b(M
(i)
) denote the a posteriori probabilities of the elements of the true
map, M.
The weights w
i
in (4.13) are used to mitigate the myopic nature of the utility
function, and are adapted heuristically, as described by (4.16) and (4.17), respec-
tively.
Simulation results for a single, random realization of a map are shown in Figure
4.1, with simulation parameters as dened in the preceding section. The average
entropy per map element and the mean square error as functions of time are shown
in Figure 4.2 and Figure 4.3. The nal average entropy and the MSE after 150
time steps are summarized in Table 4.1 and Table 4.2, for different values of the
minimum hop count parameter z
0
which controls the weights w
i
. We nd that
the adaptive, weighted mutual information utility is more effective at reducing the
error between the true and the inferred map than a plain mutual information utility
(corresponding to (4.16) with z
0
= 0). We explain this result by the adaptation of
the weights w
i
in the path planning process, which assigns more importance to the
information gain from map elements whose location is still relatively uncertain. It
is also easy to see from Tables 4.1 and 4.2 that for larger z
0
, the performance starts
to decrease again as map elements are excluded from the utility, and thus from path
planning, too soon.
77
Table 4.1: Map average entropy and MSE, w
i
adapted according to (4.16)
z
0

H [bit] MSE
0 3.39 0.72
1 3.39 0.72
2 2.23 0.52
3 2.07 0.47
4 2.34 0.54
Table 4.2: Map average entropy and MSE, w
i
adapted according to (4.17)
z
0

H [bit] MSE
0 2.43 0.56
1 2.43 0.56
2 1.94 0.44
3 2.26 0.51
4 2.41 0.55
4.5 Conclusion
We have proposed a mapping approach involving an autonomous mapper inter-
acting closely with a randomly deployed wireless sensor network, using the hop
count based observation model developed in Chapter 2 to infer a map of a physical
quantity measured by the sensor nodes. The path of the mobile sink is governed
by a utility function based on a weighted sum of the myopic, mutual information
(4.13) between the random variables describing the individual map elements and
the associated future hop count observations. We propose and simulate a heuristic
for adapting the weights w
i
in the utility function, based on a minimum hop count
criterion (4.16) and the entropy of the map elements (4.17). It is found that the
weighted mutual information utility is effective in improving map inference rela-
tive to the plain mutual information utility, in terms of both the average entropy per
map element and the mean square error between the true and the inferred map, and
without the cost and complexity of a true, non-myopic algorithm.
78
(a) True Map
(b) Inferred Map
(c) Mapping Path
Figure 4.1: The true map, the inferred map and the autonomous mapper path,
shown with hop count observations and non-responses over 150 time
steps.
79
0 50 100 150
0
2
4
6
8
time steps
a
v
g
.

e
n
t
r
o
p
y

[
b
i
t
]
Figure 4.2: The average entropy per map element over time
0 50 100 150
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
time steps
m
s
e
Figure 4.3: The mean square error between true and inferred map over time
80
Chapter 5
Conclusions and Future Work
This chapter provides the conclusions of this thesis and suggests directions of fu-
ture research.
5.1 Conclusions
This thesis addresses several challenges that are encountered in joint applications
of WSNs and autonomous platforms. The need for such applications arises for
example in autonomous mapping or search and rescue, where the pervasive sen-
sor coverage of a WSN and the mobility of an autonomous platform (which may
have additional on-board sensing and actuating capabilities specic to the mission
requirements) complement each other well. Assuming that the autonomous plat-
form is able to directly interrogate nearby sensor nodes, complex computational
and communication tasks can be ofoaded from the sensor nodes, leading to po-
tentially signicant simplications of the node hardware and software. The WSN
requirements laid out in this thesis are very simple: sensor nodes are location-
agnostic and messages are disseminated by ooding, which eliminates the need for
special-purpose localization hardware and complex routing techniques.
In Chapter 2, we have proposed a new approach to model the hop count dis-
tribution between a source and a sink node in broadcast WSNs, in which message
propagation can be described by rst-passage percolation. A limitation of the rst-
passage percolation model is the assumption of i.i.d. node-to-node message delays,
81
a condition which may not be satised if node-to-node transmissions interfere with
each other. In this thesis, we assume that such interference can be neglected and
furthermore, in Chapter 4, that multiple simultaneous broadcasts do not interact.
Our approach to model the hop count differs from related works in that the hop
count distribution arises as a maximum entropy model, given a priori informa-
tion about the hop count process from rst-passage percolation theory and under
a few simplifying assumptions. The resulting, approximate hop count distribution
is shown to have a translated Poisson distribution. The maximum entropy method
is very general and can accommodate additional a priori information, such as new
theoretical results e.g. about higher moments of the rst-passage path length.
Simulation results conrm, that for WSNs which are characterized as strongly
supercritical (i.e. with a mean node degree
c
), the empirical hop count dis-
tribution is well approximated by our model. In these WSNs, the localization error
resulting from the application of the translated Poisson model is generally small.
For node degrees approaching the critical percolation threshold
c
, as well as for
small source-to-sink distances, the error performance degrades however. This can
be attributed in part to some of the a priori assumptions made in the maximum
entropy model, which are only known to hold asymptotically, such as the linear-
ity of the mean hop count with distance, or the independence of the hop count
increments. Since the network is not connected below the critical threshold, but
rather consists of a collection of disconnected nodes and small connected clusters,
a broadcast message cannot propagate through (most) of the WSN and the hop
count model breaks down in the subcritical regime.
Due to its low computational complexity, we expect the proposed, approximate
hop count distribution to be a good candidate for use as the observation model in
Bayesian target localization or mapping applications involving low-cost WSNs,
which rely on message hop count (and possibly other information, such as RSSI).
In these applications, the observation model must be evaluated repeatedly, typically
for a very large number of location hypotheses. The proposed hop count model has
the advantage, that the model parameters are functions only of the sensor node den-
sity and transmission range, and can therefore be estimated off-line. Importantly,
the model parameters are invariant to the mean node-to-node message delay, which
in turn may depend on operating conditions, such as the average power available
82
to the sensor nodes.
In the remaining chapters, we have considered two generic applications of a
mobile platform assisted by a WSN: autonomous target search and autonomous
mapping.
In Chapter 3, we have proposed algorithms to approximately solve a path-
constrained target search problem involving an autonomous searcher assisted by a
WSN, using the hop count observation model developed in Chapter 2. The search
problem is formulated as a POMDP, whose solution is generally intractable. We
nd an approximate solution by using policy rollout, i.e. an online Monte Carlo
planning method. Our rollout approach uses a novel heuristic to approximate the
terminal cost-to-go at the rollout horizon, thereby making the approach computa-
tionally efcient. Our simulations are restricted to the case of a stationary target,
due to the focus on the hop count observation model and the resulting performance
of rollout, compared to mutual information methods.
It was shown, by simulation, that the rollout algorithm consistently outper-
forms non-myopic mutual information-driven search approaches, at comparable
computational cost. A comparison of the search-time performance under different
base policies and parallel rollout shows potential for further performance improve-
ments, although any performance gain will likely come at additional computational
cost. We have evaluated the search-time performance under different generative
models for the hop count, varying from idealized to more realistic, to quantify the
performance loss due to statistically dependent observations, given the source node
location. We nd that in our application, a reliance on the nave Bayes assumption
carries a signicant performance penalty. Therefore, we adapt an integer autore-
gressive process model to the hop count observations, which accounts for their
dependence explicitly.
Also in Chapter 3, we have shown that a target search problem described in
terms of an explicit tradeoff between exploitation and exploration (referred to as
infotaxis [112]), is mathematically equivalent to a target search with a myopic
mutual information utility. This result conrms the intuition that a mutual informa-
tion utility optimally balances exploratory and greedy behavior, when information
gain is the objective. Furthermore, in Chapter 3 we provide a lower bound on
the expected search time for multiple uniformly distributed searchers in terms of
83
the searcher density, based on the contact distance in Poisson point processes: the
time to reach the target is inversely proportional to the square root of the searcher
density.
In Chapter 4, we have considered an autonomous mapping approach involving
an autonomous mapper interacting closely with a WSN, using the hop count based
observation model developed in Chapter 2, to infer a map of a physical quantity
measured by the sensor nodes. The problem of planning an optimal path for the
mapper is generally intractable. Myopic planning approaches are common, but
may provide only poor performance. In our approach, the path of the mapper is
governed by a parametrized utility function based on a weighted sum of the my-
opic, mutual information between the random variables describing the individual
map elements and the associated future hop count observations. We propose and
simulate a heuristic for adapting the parameters of this utility function based on
both a minimum hop count criterion and the entropy of the map elements. It is
found that the weighted mutual information utility is effective in improving map
inference over a plain mutual information utility, in terms of both the average en-
tropy per map element and the mean square error between the true and the inferred
map, but without the cost and complexity of a truly non-myopic algorithm.
5.2 Future Work
5.2.1 Parametric Models of the Hop Count Distribution
While the translated Poisson model has been shown to arise as the maximum en-
tropy distribution consistent with our a priori assumptions about the hop count,
the stochastic jump process admits a more general compound Poisson distribution,
providing additional degrees of freedom in the model identication. Given further
information or assumptions about the hop count process (for example higher mo-
ments of the length of the rst-passage path), maximum entropy modeling may
yield an improved t for the hop count distribution.
84
5.2.2 Statistical Dependence of Hop Count Observations
We have found that the statistical dependence of the hop count observations has a
signicant, degrading effect on the search-time performance. Future research will
focus on models of the statistical observation dependence and their performance
evaluation. It is an open question whether a parametric approach, such as our
integer autoregressive model, or a simulation-based approach will be more efcient
to address this issue, and more work is required to determine the tradeoffs.
5.2.3 Simulation-based Observation Models
At present, the autonomous searcher relies on simple parametric models for both
the hop count distribution and the statistical dependence of the observations, which
have the advantage of low computational cost. It is likely however, that further
performance improvement will be gained only at the cost of increased complexity
of the parametric formulations. Under these conditions, an alternative is to con-
sider Monte Carlo simulation to model the hop count observation statistics. The
increased computational cost of simulation may be offset by the elimination of in-
creasingly complex parametric modeling. More importantly, simulation provides
much greater exibility in the choice of WSN message propagation mechanisms,
for which no parametric hop count models are known, or are exceedingly complex.
5.2.4 Multi-Modal Observations
In many WSNs, additional information related to sensor node locations is available
at little or no extra cost (e.g. received signal strength, or RSSI). Future work will
determine how such information can be optimally fused with the hop count infor-
mation to improve localization performance. Furthermore, in joint applications of
WSNs and autonomous platforms, it is reasonable to assume that the mobile plat-
form is equipped with on-board sensors (e.g. laser or ultrasound ranging, imaging
sensors etc.). Information fusion algorithms have to be designed to make optimal
use of all available sensors, WSN as well as on-board.
85
Bibliography
[1] M. A. Al-Osh and A. A. Alzaid. First-order integer-valued autoregressive
INAR(1) process. Journal of Time Series Analysis, 8(3):261275, May
1987.
[2] N. Antunes, G. Jacinto, and A. Pacheco. On the minimum hop count and
connectivity in one-dimensional ad hoc wireless networks.
Telecommunication Systems, 39(2):137143, 2008.
[3] J. Aulinas, Y. Petillot, J. Salvi, and X. Llad. The SLAM Problem: A
Survey. In Proceedings of the 11th Conference on Articial Intelligence
Research and Development, pages 363371, Oct. 2008.
[4] A. Baddeley, I. Brny, and R. Schneider. Spatial point processes and their
applications. In Stochastic Geometry, volume 1892 of Lecture Notes in
Mathematics, pages 175. Springer, 2007.
[5] A. D. Barbour and T. Lindvall. Translated Poisson Approximation for
Markov Chains. Journal of Theoretical Probability, 19(3):609630, Dec.
2006.
[6] R. F. Bass. Stochastic Processes. Cambridge University Press, 2011.
[7] M. Berkelaar, K. Eikland, and P. Notebaert. Lpsolve 5.5, Open Source
Mixed-Integer Linear Programming System.
http://lpsolve.sourceforge.net/5.5/, 2004.
[8] D. P. Bertsekas and D. A. Castaon. Rollout algorithms for stochastic
scheduling problems. J. Heuristics, 5(1):89108, Apr. 1999.
[9] S. Beyme and C. Leung. Modeling the hop count distribution in wireless
sensor networks. In Proceedings of the 26th Annual IEEE Canadian
Conference on Electrical and Computer Engineering (CCECE), pages 16,
May 2013.
86
[10] S. Beyme and C. Leung. A stochastic process model of the hop count
distribution in wireless sensor networks. Ad Hoc Networks, 17:6070, June
2014.
[11] A. Biswas and P. X.-K. Song. Discrete-valued ARMA processes. Statistics
& Probability Letters, 79(17):18841889, Sept. 2009.
[12] J. Blumenthal, R. Grossmann, F. Golatowski, and D. Timmermann.
Weighted Centroid Localization in Zigbee-based Sensor Networks. In
IEEE International Symposium on Intelligent Signal Processing, 2007,
pages 16, Oct. 2007.
[13] B. Bollobs and O. Riordan. Percolation. Cambridge University Press,
2006.
[14] F. Bourgault, A. A. Makarenko, S. B. Williams, B. Grocholsky, and H. F.
Durrant-Whyte. Information based adaptive robotic exploration. In
Proceedings of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pages 540545, Sept. 2002.
[15] S. R. Broadbent and J. M. Hammersley. Percolation processes. I. Crystals
and Mazes. Proceedings of the Cambridge Philosophical Society, 53:
629641, July 1957.
[16] H. S. Chang, R. Givan, and E. K. P. Chong. Parallel Rollout for Online
Solution of Partially Observable Markov Decision Processes. Discrete
Event Dynamic Systems, 14(3):309341, July 2004.
[17] E. P. Chong, C. M. Kreucher, and A. O. Hero III. Partially Observable
Markov Decision Process Approximations for Adaptive Sensing. Discrete
Event Dynamic Systems, 19(3):377422, Sept. 2009.
[18] COIN. COmputational INfrastructure for Operations Research.
http://www.coin-or.org, 2009.
[19] T. M. Cover and J. A. Thomas. Elements of Information Theory.
Wiley-Interscience, 2006.
[20] S. De, A. Caruso, T. Chaira, and S. Chessa. Bounds on hop distance in
greedy routing approach in wireless ad hoc networks. Int. J. Wire. Mob.
Comput., 1(2):131140, Feb. 2006.
[21] E. P. de Freitas, T. Heimfarth, A. Vinel, F. R. Wagner, C. E. Pereira, and
T. Larsson. Cooperation among wirelessly connected static and mobile
87
sensor nodes for surveillance applications. Sensors, 13(10):1290312928,
Sept. 2013.
[22] N. Deshpande, E. Grant, and T. Henderson. Target localization and
autonomous navigation using wireless sensor networks; a pseudogradient
algorithm approach. IEEE Systems Journal, 8(1):93103, Mar. 2014.
[23] O. Dousse, P. Mannersalo, and P. Thiran. Latency of wireless sensor
networks with uncoordinated power saving mechanisms. In Proceedings of
the 5th ACM International Symposium on Mobile Ad Hoc Networking and
Computing, pages 109120, May 2004.
[24] S. Dulman, M. Rossi, P. Havinga, and M. Zorzi. On the hop count statistics
for randomly deployed wireless sensor networks. Int. J. Sen. Netw., 1(1):
89102, Sept. 2006.
[25] H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping:
part i. IEEE Robot. Autom. Mag., 13(2):99110, June 2006.
[26] R. Durrett. Probability: Theory and Examples. Cambridge University
Press, 2010.
[27] J. N. Eagle. The Optimal Search for a Moving Target When the Search
Path is Constrained. Operations Research, 32(5):11071115, Oct. 1984.
[28] J. N. Eagle and J. R. Yee. An optimal branch-and-bound procedure for the
constrained path, moving target search problem. Operations Research, 38
(1):110114, 1990.
[29] E. Eberlein. Jump-type Lvy processes, pages 439455. Springer, 2009.
[30] M. Eden. A Two-dimensional Growth Process. In Proceedings of the 4th
Berkeley Symposium on Mathematical Statistics and Probability, Volume 4:
Contributions to Biology and Problems of Medicine, pages 223239, 1961.
[31] D. Fox, W. Burgard, and S. Thrun. Active markov localization for mobile
robots. Robotics and Autonomous Systems, 25(3-4):195207, Nov. 1998.
[32] T. Furukawa, F. Bourgault, B. Lavis, and H. Durrant-Whyte. Recursive
Bayesian search-and-tracking using coordinated UAVs for lost targets. In
Proceedings of the 2006 IEEE International Conference on Robotics and
Automation (ICRA), pages 25212526, May 2006.
88
[33] R. Ganti and M. Haenggi. Dynamic Connectivity and Packet Propagation
Delay in ALOHA Wireless Networks. In Conference Record of the 41st
Asilomar Conference on Signals, Systems and Computers (ACSSC07),
pages 143147, Nov. 2007.
[34] R. Ganti and M. Haenggi. Bounds on the information propagation delay in
interference-limited ALOHA networks. In Proceedings of the 7th
international conference on Modeling and Optimization in Mobile, Ad Hoc,
and Wireless Networks, pages 513519, June 2009.
[35] E. N. Gilbert. Random Plane Networks. Journal of the Society for
Industrial and Applied Mathematics, 9(4):533543, Dec. 1961.
[36] V. Gintautas, A. A. Hagberg, and L. M. A. Bettencourt. Leveraging
synergy for multiple agent infotaxis. In Proceedings of Social Computing,
Behavioral Modeling, and Prediction, Los Alamos National Laboratory,
Jan. 2008.
[37] R. M. Gray. Probability, Random Processes, and Ergodic Properties.
Springer Publishing Company, 2nd edition, 2009.
[38] A. Guitouni and H. A. Masr. A Nonlinear Mixed Integer Program for
Search Path Planning Problem. In Proceedings of the 4th Multidisciplinary
International Scheduling Conference: Theory and Applications (MISTA
2009), pages 277290, Aug. 2009.
[39] S. Guo, Y. Gu, B. Jiang, and T. He. Opportunistic ooding in
low-duty-cycle wireless sensor networks with unreliable links. In
Proceedings of the 15th Annual International Conference on Mobile
Computing and Networking (MobiCom 09), pages 133144, Sept. 2009.
[40] M. Haenggi, J. Andrews, F. Baccelli, O. Dousse, and M. Franceschetti.
Stochastic geometry and random graphs for the analysis and design of
wireless networks. IEEE J. Sel. Areas Commun., 27(7):10291046, Sept.
2009.
[41] J. M. Hammersley and D. J. A. Welsh. First-passage percolation,
subadditive processes, stochastic networks and generalized renewal theory.
Bernoulli, Bayes, Laplace Anniversary Volume (Neyman, J. and LeCam, L.,
eds), pages 61110, 1965.
[42] M. H. Hansen and B. Yu. Model Selection and the Principle of Minimum
Description Length. Journal of the American Statistical Association, 96:
746774, 1998.
89
[43] G. Hoffmann and C. Tomlin. Mobile Sensor Network Control Using
Mutual Information Methods and Particle Filters. IEEE Trans. Autom.
Control, 55(1):3247, Jan. 2010.
[44] G. Hoffmann, S. Waslander, and C. Tomlin. Mutual Information Methods
with Particle Filters for Mobile Sensor Network Control. In Proceedings of
the 45th IEEE Conference on Decision and Control, pages 10191024,
Dec. 2006.
[45] G. M. Hoffmann. Autonomy for Sensor-Rich Vehicles: Interaction between
Sensing and Control Actions. PhD thesis, Stanford University, Sept. 2008.
[46] G. M. Hoffmann, S. L. Waslander, and C. J. Tomlin. Distributed
Cooperative Search using Information-Theoretic Costs for Particle Filters
with Quadrotor Applications. In Proceedings of the AIAA Guidance,
Navigation, and Control Conference, pages 2124, 2006.
[47] C. D. Howard. Models of First-Passage Percolation. In Probability on
Discrete Structures, pages 125174. Springer, 2004.
[48] R. A. Howard. Dynamic Programming and Markov Processes. MIT Press,
1960.
[49] J. Hu, L. Xie, K.-Y. Lum, and J. Xu. Multiagent information fusion and
cooperative control in target search. IEEE Trans. Control Syst. Technol., 21
(4):12231235, July 2013.
[50] K. Ito. Stochastic Processes: Lectures given at Aarhus University.
Springer, 2004.
[51] E. T. Jaynes. Prior probabilities. IEEE Trans. Syst. Sci. Cybern., 4:
227241, 1968.
[52] E. T. Jaynes. On the rationale of maximum-entropy methods. Proceedings
of the IEEE, 70(9):939952, Sept. 1982.
[53] E. T. Jaynes. Entropy and Search Theory. In C. R. Smith and
J. W. T. Grandy, editors, Maximum-Entropy and Bayesian Methods in
Inverse Problems, pages 443454. D. Reidel, 1985.
[54] O. Johnson. Log-concavity and the maximum entropy property of the
Poisson distribution. Stochastic Processes and their Applications, 117(6):
791802, June 2007.
90
[55] R. C. Jung, G. Ronning, and A. Tremayne. Estimation in conditional rst
order autoregression with discrete support. Statistical Papers, 46(2):
195224, Apr. 2005.
[56] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting
in partially observable stochastic domains. Articial Intelligence, 101(1-2):
99134, May 1998.
[57] R. E. Kalman. A new approach to linear ltering and prediction problems.
Journal of Fluids Engineering, 82(1):3545, 1960.
[58] S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation
Theory. Prentice-Hall, Inc., 1993.
[59] H. Keeler and P. Taylor. A stochastic analysis of a greedy routing scheme
in sensor networks. SIAM Journal on Applied Mathematics, 70(7):
22142238, Apr. 2010.
[60] C. Keller. Applying optimal search theory to inland SAR: Steve Fossett
case study. In Proceedings of the 13th Conference on Information Fusion
(FUSION10), pages 18, 2010.
[61] J. F. C. Kingman. Subadditive ergodic theory. The Annals of Probability, 1
(6):883899, Dec. 1973.
[62] D. E. Knuth. The Art of Computer Programming, Volume 3: Sorting and
Searching. Addison Wesley Longman Publishing Co., Inc., 2nd edition,
1998.
[63] Z. Kong and E. M. Yeh. Characterization of the Critical Density for
Percolation in Random Geometric Graphs. In Proceedings of the IEEE
International Symposium on Information Theory (ISIT07), pages 151155,
June 2007.
[64] Z. Kong and E. M. Yeh. Information dissemination in large-scale wireless
networks with unreliable links. In Proceedings of the 4th Annual
International Conference on Wireless Internet, pages 321329, Nov. 2008.
[65] T. M. Liggett. An improved subadditive ergodic theorem. The Annals of
Probability, 13(4):12791285, Nov. 1985.
[66] T. M. Liggett. Ultra Logconcave Sequences and Negative Dependence.
Journal of Combinatorial Theory, Series A, 79(2):315325, Aug. 1997.
91
[67] M. L. Littman, A. R. Cassandra, and L. P. Kaelbling. Learning Policies for
Partially Observable Environments: Scaling Up. In Readings in Agents,
pages 495503. Morgan Kaufmann Publishers Inc., 1998.
[68] N. Lo, J. Berger, and M. Noel. Toward optimizing static target search path
planning. In Proceedings of the IEEE Symposium on Computational
Intelligence for Security and Defence Applications (CISDA), pages 17,
2012.
[69] W. S. Lovejoy. A survey of algorithmic methods for partially observed
Markov decision processes. Annals of Operations Research, 28(1):4765,
1991.
[70] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms.
Cambridge University Press, 2002.
[71] A. Mahajan and D. Teneketzis. Multi-armed bandit problems. In
Foundations and Applications of Sensor Management, pages 121151.
Springer, 2008.
[72] G. Mao, Z. Zhang, and B. Anderson. Probability of k-hop connection
under random connection model. IEEE Commun. Lett., 14(11):10231025,
Nov. 2010.
[73] A. Marjovi, J. Nunes, P. Sousa, R. Faria, and L. Marques. An
olfactory-based robot swarm navigation method. In Proceedings of the
2010 IEEE International Conference on Robotics and Automation (ICRA),
pages 49584963, May 2010.
[74] J.-B. Masson, M. B. Bechet, and M. Vergassola. Chasing information to
search in random environments. J. Phys. A: Math. Theor., 42(43):434009,
Oct. 2009.
[75] A. Molisch. Wireless Communications. Wiley, 2nd edition, 2010.
[76] S. Nath, V. Ekambaram, A. Kumar, and P. V. Kumar. Theory and
Algorithms for Hop-Count-Based Localization with Random Geometric
Graph Models of Dense Sensor Networks. ACM Trans. Sen. Netw., 8(4):
138, Sept. 2012.
[77] D. Niculescu and B. Nath. DV Based Positioning in Ad Hoc Networks.
Telecommunication Systems, 22:267280, 2003.
92
[78] U. Orguner, P. Skoglar, D. Tornqvist, and F. Gustafsson. Combined
point-mass and particle lter for target tracking. In Proceedings of the 2010
IEEE Aerospace Conference, pages 110, Mar. 2010.
[79] C. Papadimitriou and J. N. Tsitsiklis. The Complexity of Markov Decision
Processes. Math. Oper. Res., 12(3):441450, Aug. 1987.
[80] N. Patwari, J. Ash, S. Kyperountas, A. Hero, R. Moses, and N. Correal.
Locating the nodes: cooperative localization in wireless sensor networks.
IEEE Signal Process. Mag., 22(4):5469, July 2005.
[81] M. Penrose. Random Geometric Graphs (Oxford Studies in Probability).
Oxford University Press, USA, 2003.
[82] G. Rahmatollahi and G. Abreu. Closed-form hop-count distributions in
random networks with arbitrary routing. IEEE Transactions on
Communications, 60(2):429444, 2012.
[83] R. Rezaiifar and A. M. Makowski. From optimal search theory to
sequential paging in cellular networks. IEEE J.Sel. A. Commun., 15(7):
12531264, Sept. 1997.
[84] D. Richardson. Random Growth in a Tessellation. Mathematical
Proceedings of the Cambridge Philosophical Society, 74:515528, Nov.
1973.
[85] S. Ross, J. Pineau, S. Paquet, and B. Chaib-draa. Online planning
algorithms for POMDPs. J Artif Intell Res., 32(2):663704, July 2008.
[86] J. O. Royset and H. Sato. Route optimization for multiple searchers. Naval
Research Logistics (NRL), 57(8):701717, 2010.
[87] A. Ryan and J. K. Hedrick. Particle lter based information-theoretic active
sensing. Robotics and Autonomous Systems, 58(5):574584, May 2010.
[88] Z. A. Saigol, R. W. Dearden, J. L. Wyatt, and B. J. Murton.
Information-lookahead planning for AUV mapping. In Proceedings of the
21st International Joint Conference on Articial intelligence, pages
18311836, July 2009.
[89] S. Sarid, A. Shapiro, E. Rimon, and Y. Edan. Classifying the
heterogeneous multi-robot online search problem into quadratic time
competitive complexity class. In Proceedings of the 2011 IEEE
International Conference on Robotics and Automation (ICRA), pages
49624967, May 2011.
93
[90] I. C. Schick and S. K. Mitter. Robust recursive estimation in the presence
of heavy-tailed observation noise. The Annals of Statistics, 22(2):
10451080, 1994.
[91] S. Shue and J. Conrad. A survey of robotic applications in wireless sensor
networks. In Proceedings of 2013 IEEE Southeastcon, pages 15, Apr.
2013.
[92] S. Singh and V. Krishnamurthy. The optimal search for a Markovian target
when the search path is constrained: the innite-horizon case. IEEE Trans.
Autom. Control, 48(3):493497, Mar 2003.
[93] P. Skoglar. Planning Methods for Aerial Exploration and Ground Target
Tracking. PhD thesis, Linkping University, Oct. 2009.
[94] P. Skoglar, U. Orguner, and F. Gustafsson. On information measures based
on particle mixture for optimal bearings-only tracking. In Proceedings of
the 2009 IEEE Aerospace Conference, pages 114, March 2009.
[95] R. T. Smythe and J. C. Wierman. First-Passage Percolation on the Square
Lattice, I. Advances in Applied Probability, 9(1):3854, Mar. 1977.
[96] R. T. Smythe and J. C. Wierman. First-Passage Percolation on the Square
Lattice, III. Advances in Applied Probability, 10(1):155171, Mar. 1978.
[97] N.-O. Song and D. Teneketzis. Discrete search with multiple sensors.
Mathematical Methods of Operations Research, 60(1):113, 2004.
[98] A. Souza, R. Maia, R. Aroca, and L. Goncalves. Probabilistic robotic grid
mapping based on occupancy and elevation information. In Proceedings of
the 2013 16th International Conference on Advanced Robotics (ICAR),
pages 16, Nov 2013.
[99] C. Stachniss, G. Grisetti, and W. Burgard. Information Gain-based
Exploration Using Rao-Blackwellized Particle Filters. In Proceedings of
Robotics: Science and Systems (RSS), pages 6572, June 2005.
[100] D. Stauffer and A. Aharony. Introduction to percolation theory. Taylor &
Francis, 1992.
[101] F. W. Steutel and K. van Harn. Discrete analogues of self-decomposability
and stability. The Annals of Probability, 7(5):893899, Oct. 1979.
94
[102] R. Stoleru, T. He, and J. A. Stankovic. Range-Free Localization. In Secure
Localization and Time Synchronization for Wireless Sensor and Ad Hoc
Networks, volume 30, pages 331. Springer US, 2007.
[103] L. Stone. Theory of Optimal Search. Academic Press, 1975.
[104] D. Stoyan, W. S. Kendall, and J. Mecke. Stochastic Geometry and Its
Applications, 2nd Edition. Wiley, 1996.
[105] B. Sturmfels. Polynomial Equations and Convex Polytopes. The American
Mathematical Monthly, 105(10):907922, Dec. 1998.
[106] R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning.
MIT Press, 1st edition, 1998.
[107] X. Ta, G. Mao, and B. Anderson. Evaluation of the Probability of K-Hop
Connection in Homogeneous Wireless Sensor Networks. In Proceedings of
the 2007 IEEE Global Telecommunications Conference, pages 12791284,
Nov. 2007.
[108] P. Thompson, E. Nettleton, and H. F. Durrant-Whyte. Distributed large
scale terrain mapping for mining and autonomous systems. In 2011
IEEE/RSJ International Conference on Intelligent Robots and Systems,
IROS 2011, San Francisco, CA, USA, September 25-30, 2011, pages
42364241, 2011.
[109] S. Thrun. Robotic Mapping: A Survey. In Exploring Articial Intelligence
in the New Millennium, pages 135. Morgan Kaufmann Publishers Inc.,
2003.
[110] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics (Intelligent
Robotics and Autonomous Agents). The MIT Press, 2005.
[111] K. E. Trummel and J. R. Weisinger. The complexity of the optimal searcher
path problem. Oper. Res., 34(2):324327, Mar. 1986.
[112] M. Vergassola, E. Villermaux, and B. I. Shraiman. Infotaxis as a strategy
for searching without gradients. Nature, 445(7126):406409, Jan. 2007.
[113] S. Vural and E. Ekici. On Multihop Distances in Wireless Sensor Networks
with Random Node Locations. IEEE Trans. Mobile Comput., 9(4):
540552, Apr. 2010.
[114] C. H. Weiss. Thinning operations for modeling time series of counts - a
survey. Advances in Statistical Analysis, 92(3):319341, Aug. 2008.
95
[115] J. C. Wierman. First-Passage Percolation on the Square Lattice, II.
Advances in Applied Probability, 9(2):283295, June 1977.
[116] R. Wong, J. Xiao, S. Joseph, and Z. Shan. Data association for
simultaneous localization and mapping in robotic wireless sensor networks.
In Proceedings of the 2010 IEEE/ASME International Conference on
Advanced Intelligent Mechatronics (AIM), pages 459464, July 2010.
[117] M. Zorzi and R. Rao. Geographic random forwarding (GeRaF) for ad hoc
and sensor networks: multihop performance. IEEE Trans. Mobile Comput.,
2(4):337348, Dec. 2003.
96
Appendix A
Necessary Condition for the Hop
Count Process
In this Appendix, it is shown that neither subadditivity nor superadditivity hold
for the hop count of the rst-passage path in the random geometric graph G. We
utilize this property in Section 2.3.1 as a necessary condition for the hop count to
be modeled as a sum of increments (2.8).
Proof. For given nodes o, x, y G, the rst-passage path (o, y) has a passage
time which satises T
o,y
T
o,x
+T
x,y
, due to subadditivity (2.3). Suppose that the
associated hop count is subadditive, i.e. N
o,y
N
o,x
+N
x,y
, with equality if x is
on the rst-passage path. Assume that x is not on the rst-passage path. Then,
there is at least one edge not common with either of the paths (o, x) or (x, y),
denoted (u, v). We construct a new graph realization G
/
by opening edge (u, v)
and instantiating a new node w such that the Euclidean distances |uw| R and
|vw| R, i.e. node w is linked to both u and v. Furthermore, set the edge
passage times (u, w), (w, v) such that (u, w) +(w, v) = (u, v); for any
additional edges that may form by inserting w, choose edge passage times large
enough so that the passage times of the paths (o, y), (o, x) and (x, y) remain
unchanged from G. However, the hop count of (o, y) has now increased by 1. We
can repeat inserting nodes in this manner, until N
o,y
> N
o,x
+N
x,y
. This contradicts
our assumption that the hop count is subadditive.
97
By an analogous argument, it can be shown that the hop count is not superad-
ditive, either.
98
Appendix B
Proof of Strong Mixing Property
In this Appendix, it is shown that the increments of the hop count process on the
unit distance graph, dened on the square lattice Z
2
, are asymptotically indepen-
dent, or more formally, strongly mixing. We use this property in Section 2.3.1 to
motivate a simplifying independence assumption, as one of the dening properties
of the increments of a Lvy process.
A network represented by the unit distance graph on the square lattice Z
2
may
be characterized as a dynamical system (, F, , ), that is, a probability space
together with a measurable transformation : [37]. Here, =R
E
+
is the
conguration space of the network, formed by the Cartesian product of its i.i.d.
edge passage time variables (e)
eE
. Furthermore, F is a -algebra and is
a probability measure preserved under the transformation : , dened as a
shift of the i.i.d. edge passage time variables by one unit to the left, that is
(F) = (
1
F) for any F F. (B.1)
The measure-preserving transformation and a random variable N
0,n
: Z
+
,
taking values in the measurable space (Z
+
, A), together dene the strictly station-
ary process of the hop count increments
N
kn,(k+1)n
() = N
0,n
(
kn
()) (B.2)
Given the adjacent hop count increments N
0,n
and N
n,2n
, the arbitrary hop count
99
events A, B A and their pre-images F, G F correspond through
F = N
1
0,n
(A) = : N
0,n
() A (B.3)
G = N
1
n,2n
(B) = : N
n,2n
() B (B.4)
It is common to write N
0,n
A etc. for these events.
Denition B.1 ([26]). A dynamical system (, F, , ) with a measure-preserv-
ing (shift) transformation is said to be strongly mixing, if for all measurable sets
F, G F the condition
lim
k
(F
k
G) = (F)(G) (B.5)
is satised.
Proposition 1. The increments N
0,n
, N
n,2n
, . . . are strongly mixing.
Proof. Without loss of generality, let n = 2 j for j = 0, 1, . . .. We dene a two-
parameter family of translated and scaled cylinder sets C(t, s) Z
2
, where t Z
and s N, by
C(t, s) =
_

_
1, . . . , nZ o if t = 0, s = 1
n+1, . . . , 2nZ ne
x
if t = n, s = 1
t (s 1) j +1, . . . , t +(s +1) jZ otherwise.
(B.6)
Let N
0,n
[
C(0,k)
denote the hop count variable associated with the rst-passage path
connecting o and ne
x
, restricted to the cylinder C(0, k), for k = 1, 2, . . .. Similarly,
N
kn,(k+1)n
[
C(kn,k)
denotes the hop count variable associated with the rst-passage
path connecting kne
x
and (k +1)ne
x
, restricted to the cylinder C(kn, k). For every
k, the cylinder sets C(0, k) and C(kn, k) dene two disjoint subgraphs on the square
lattice. First-passage paths restricted to disjoint subsets of the square lattice clearly
have independent hop counts, therefore we have

_
N
0,n
[
C(0,k)
AN
kn,(k+1)n
[
C(kn,k)
B
_
=
N
0,n
[
C(0,k)
AN
kn,(k+1)n
[
C(kn,k)
B.
(B.7)
100
By the shift-invariance of the measure ,

_
N
0,n
[
C(0,k)
A
(k1)n
N
n,2n
[
C(n,k)
B
_
=
N
0,n
[
C(0,k)
AN
n,2n
[
C(n,k)
B.
(B.8)
Taking k relaxes the cylinder restriction, and we obtain
lim
k

_
N
0,n
A
(k1)n
N
n,2n
B
_
=
N
0,n
AN
n,2n
B,
(B.9)
that is, the hop count increments satisfy the strong mixing condition (B.5).
101
Appendix C
Constrained-Path Search as
Integer Program
We consider the problem of nding a target in a wireless sensor network. Varia-
tions of this problem include, for example, paging a user in some ad hoc network,
or a mobile searcher able to interrogate nodes in a sensor network. The mobile
searcher looks for the target in the current cell and if unsuccessful, passes to one of
the adjacent cells. This imposes a constraint on the possible sequences of visited
cells, or the search path. We assume that an a-priori probability distribution of the
location of the target is known to the path planning algorithm. The path planners
objective is to minimize the expected time to detection. We show that this problem
can be expressed and solved as an integer linear program, and evaluate its perfor-
mance in comparison with a maximum probability of detection search, as well as
an unconstrained search.
C.1 Introduction
For the rollout algorithm proposed in Chapter 3, a possible heuristic to compute the
expected search time is based on the assumption that beyond the rollout horizon, no
further hop count observations (other than detecting the target, dened as observing
a hop count of zero) are made. This assumption is reasonable, especially with
the search progressing to the late stage, when hop count observations tend to be
102
exhausted in the vicinity of the target.
In this Appendix, we consider the path-constrained search for a target node in
a simplied wireless sensor network. It is assumed that a belief state of the target is
available to the searcher, inferred from past hop count observations up to the rollout
horizon, as described in Section 3.2.2. Within the scope of the algorithms studied
in this Appendix, we refer to the belief state as the a priori probability distribution
of the target.
While optimal search problems are commonly associated with e.g. search and
rescue operations or exploration [60], they are encountered also in applications
such as target detection or paging in wireless sensor or ad hoc networks [83]. The
theory of optimal search traces its origins to anti-submarine operations during WW-
II. The general setup involves a target hidden in some search space, whose where-
abouts are characterized by an a priori probability distribution, and a searcher try-
ing to detect the target, typically such that the cumulative probability of detection
is maximized, subject to a cost constraint, e.g. a nite search time horizon. If the
search effort is applied continuously and the searchers movement is not restricted
by a path constraint, the problem can be solved analytically under suitable condi-
tions for the detection function [103]. If the search is performed by taking discrete
looks and a path constraint is imposed, numerical optimization techniques must
generally be resorted to. Path-constrained search is a computationally challenging
problem in so far as its complexity has been shown to be NP-complete [111]. A
number of algorithms for the solution of the optimal, path-constrained search have
been published by various authors. These include approaches based on dynamic
programming (POMDP) [27] as well as branch-and-bound (non-linear) integer pro-
gramming [28, 38, 68, 86].
In the majority of related works, the search planners objective is the maximiza-
tion of the target detection probability over a nite time horizon. In this Appendix,
we consider the minimization of the expected search time, and show how to ex-
press and solve this search problem as an integer linear program. We compare the
detection performance due to our objective function to the search with maximum
probability of detection, and also to the upper bound on the cumulative detection
probability obtained by the optimal search with path constraint relaxation.
The Appendix is structured as follows: in Section C.2, we state our assump-
103
tions, and formulate integer linear programs with the objectives of minimum ex-
pected search time and maximum probability of detection, respectively. In Section
C.3, we provide some simulation results for the optimal search path and compare
the cumulative detection probability obtained for the different objective functions.
Conclusions are drawn in Section C.4.
C.2 System Model
The target search is performed on a 2-dimensional search space consisting of J
cells, j = 1, . . . , J. The targets whereabouts are characterized by an a priori prob-
ability distribution p
j
over the search space. If the target is in the cell interrogated
by the searcher, it will be detected with probability q = 1. We do not allow false
detections in case of an absent target. If the searcher does not detect the target in
the present cell j, it moves to a cell in the set N ( j), dened as the set of neigh-
bors of j. We assume that the searcher is initially located in cell , and the target
in cell . The search proceeds through time steps k = 1, . . . , H, with the horizon
H J. The search path, i.e. the sequence of cells j(k), k = 1, . . . , H followed by
the searcher is denoted by . Then, the probability of detecting the target on the
search path is
P

=
H

k=1
p
j(k)
. (C.1)
We formulate the search problem as a binary integer linear program. A set of
binary decision variables x
j
[k] is used to indicate the location of the searcher in cell
j at time k.
C.2.1 Minimizing the Expected Search Time
The objective is to minimize the expected search time over all paths of length H,
given that the target is on the search path ,
minimize E[K[ ] =
H

k=1
k
p
j(k)
P

(C.2)
104
where P

is given by (C.1). Because there is no obvious way to transform this


objective into a linear function of binary decision variables, we consider the related
problem
minimize
H

k=1
k p
j(k)
. (C.3)
If the horizon is H < J, a problem with this objective function is that a search plan-
ner could minimize it by simply avoiding cells with high probability of containing
the target, and visit cells with low probability instead. The objective function is
therefore augmented with a term that penalizes such attempts by giving weight to
the probability of the cells not visited:
minimize
H

k=1
k p
j(k)
+(H +1)(1P

). (C.4)
In terms of the binary decision variables x
j
[k], the objective function (C.4) is now
expressed as
H

k=1
k
J

j=1
p
j
x
j
[k] +(H +1)
H

k=1
J

i=1
p
i
(1x
i
[k]) (C.5)
=
H

k=1
J

j=1
k p
j
x
j
[k] +(H +1) p
j
(1x
j
[k]))
=
H

k=1
J

j=1
(k H1) p
j
x
j
[k] +(H +1)
H

k=1
J

j=1
p
j
=
H

k=1
J

j=1
(k H1) p
j
x
j
[k] +(H +1)H (C.6)
Before we state the constraints of the search problem, we show that no smaller
value can be obtained for the objective function (C.6) by exchanging a cell on the
search path for one of the unvisited cells of lower probability.
Proof. Referring to (C.5), let j be a cell on, and i a cell off the optimal search path,
with p
j
> p
i
. We show that the objective value increases if we exchange p
j
for p
i
.
From (C.5), we see that the contribution of cells j and i to the value of the objective
105
function is k p
j
+(H +1) p
i
. We determine that
k p
j
+(H +1) p
i
< k p
i
+(H +1) p
j
(C.7)
holds if and only if p
j
> p
i
, for k H. Therefore, visiting cell i instead of j can
only increase the value of the objective function (C.5).
Constant terms in the objective function (C.6) are ignored during optimization,
so that the nal optimization problem takes the form
minimize
H

k=1
J

j=1
(k H1) p
j
x
j
[k] (C.8)
subject to
J

j=1
x
j
[k] = 1 all k (C.9)
H

k=1
x
j
[k] 1 all j (C.10)

iN ( j)
x
i
[k] x
j
[k +1] 0 all j, k (C.11)
x

[1] = 1 (C.12)
x
j
[k] 0, 1 all j, k (C.13)
where j = 1, . . . , J and k = 1, . . . , H. Constraint (C.9) ensures that one and only
one cell can be visited at each time k, (C.10) prevents the searcher from looking in
cells already visited and (C.11) restricts the searcher to move only to neighbors of
the current cell. Constraint (C.12) nally denes the searchers initial position.
C.2.2 Maximizing the Detection Probability
For comparison, we also present an integer linear program describing the maxi-
mum detection probability search. Here, the search planner simply maximizes the
probability of detecting the target over all paths of length H, with no particular
weight given to early detection. As can be seen, this problem differs from the
minimum expected time search only in the objective function.
106
maximize
H

k=1
J

j=1
p
j
x
j
[k] (C.14)
subject to
J

j=1
x
j
[k] = 1 all k (C.15)
H

k=1
x
j
[k] 1 all j (C.16)

iN ( j)
x
i
[k] x
j
[k +1] all j, k (C.17)
x

[1] = 1 (C.18)
x
j
[k] 0, 1 all j, k (C.19)
C.3 Performance Evaluation
To evaluate and compare the performance of the two search problems (C.8) and
(C.14), we dene a 107 grid as the search space (J = 70), on which the searcher
is allowed to move subject to the next-neighbor constraint. This setup represents
a wireless sensor network with cell centers corresponding to node locations, as
described in Section 3.2.1. The a priori target distribution over the search space is
dened by a Gaussian mixture with the components
0.28 N((4, 2), 0.8) +0.72 N((7, 5), 2.0). (C.20)
The mobile searcher is initially placed in cell j = 1 with the grid coordinates (1, 1).
The search starts at time k = 1 and has a horizon of H = 35 time steps.
Both problems (C.8) and (C.14) give rise to integer linear programs of size
J K = 2450 binary variables and 2486 constraints. This relatively large problem
size dictates a programmatic generation of the problem description, suitable for
input into an integer linear program solver. We use the open-source lpSolve
package [7] to generate the problem description, and solve it using the Cbc mixed-
integer linear program solver [18].
The computed optimal search paths for the minimum expected search time
107
(Figure C.1) and the maximum probability of detection (Figure C.2) show that in
order to minimize the expected time to detection, the searcher visits cells with high
a priori target probabilities as early as possible, unlike the maximum probability
of detection search. As can be seen in Figure C.3, the cumulative probability of
detection for the minimum expected time search increases at a faster rate initially.
Reaching the horizon however, the maximum probability of detection search will
achieve a higher total cumulative probability of detection, P

, as expected. In
Figure C.3 we compare the results of both search plans to the upper bound resulting
from the relaxation of the search path constraint. In this case, where the searcher
is free to move to any node within the search space in a single time step, it is well
known that the optimal search policy is to visit the cells in descending order of the
a priori target probability [103]. This policy is optimal for both the probability of
detection and expected search time.
To demonstrate the effectiveness of the penalized objective function (C.4) for
the minimum expected time search, we provide an example for a search using the
unpenalized objective function (C.3), which would enable the search planner to
minimize the objective function simply by visiting cells with low a priori target
probabilities. The resulting search path in Figure C.4 shows that this is indeed
the consequence, and Figure C.5 conrms, that the unpenalized objective function
(C.3) is not suitable for achieving a high probability of detection.
C.4 Conclusion
We have formulated an integer linear programto solve the optimal, path-constrained
search in a wireless sensor network with a minimum expected search time objec-
tive. We are able to show for the simulated conguration, that the expected search
time, conditioned on a target located on the search path, is an improvement over
the search time obtained by the maximum probability of detection search, at the
expense of a very small loss of the total detection probability over the entire search
horizon. Over most of the search horizon, the cumulative probability of detection
achieved by the minimum expected time search approaches more closely the upper
bound given by a search plan with relaxed path constraint.
Unfortunately, as a heuristic to approximate the future expected cost within
108
Figure C.1: Search path of minimum expected search time (C.8). The
a-priori target probability is a Gaussian mixture (C.20) over a search
space of size J = 107. The search horizon is H = 35. The expected
search time is E[K[ ] = 17.28, with a cumulative probability of
detection of P

= 0.793.
the framework of Monte Carlo solution methods for online POMDPs, integer pro-
gramming is not practical due to its computational complexity. The solution times
for problems of a size as studied here exceed that of other search time heuristics as
proposed in Section 3.4.2 by about 4 orders of magnitude.
109
Figure C.2: Search path of maximum probability of detection (C.14). The
a-priori target probability is a Gaussian mixture (C.20) over a search
space of size J = 107. The search horizon is H = 35. The expected
search time is E[K[ ] = 20.05, with a cumulative probability of
detection of P

= 0.804.
110
0 5 10 15 20 25 30 35
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Time
P
r
o
b
a
b
i
l
i
t
y

o
f

D
e
t
e
c
t
i
o
n
min. expected search time
max. probability of detection
relaxation of path constraint
Figure C.3: Comparison of the cumulative probability of detection, P

, for
the two search policies of minimum expected search time and maxi-
mum probability of detection, and for the optimal search without path
constraint. The minimum expected time search tends to maximize the
probability of detection early, whereas the maximum probability of de-
tection search maximizes over the entire horizon and achieves a higher
probability of detection when reaching the horizon, which is evident in
this example.
111
Figure C.4: Search path of minimum expected search time for the unpe-
nalized objective function (C.3). Ignoring the penalty term introduced
in (C.4), the search planner would be permitted to minimize the objec-
tive function by visiting only cells with low a-priori probability, likely
missing the target. Compare with Figure C.1.
112
0 5 10 15 20 25 30
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Time
P
r
o
b
a
b
i
l
i
t
y

o
f

D
e
t
e
c
t
i
o
n
min. expected search time
Figure C.5: Cumulative probability of detection for the unpenalized, min-
imum expected search time objective function (C.3). Ignoring the
penalty term introduced in (C.4), the search planner would be permit-
ted to minimize the objective function by visiting only cells with low
a-priori probability so that the cumulative probability of detection re-
mains small. Compare with Figure C.3.
113
Appendix D
Infotaxis and Mutual Information
D.1 Background
A generalized search strategy referred to as infotaxis was introduced in [112], in
the context of localizing a diffusive source in dilute conditions, based on discrete
detections of particles emitted by the source. In the framework of infotaxis, the a
posteriori belief state b
X
for the source position X, conditioned on the history of ob-
servations, is updated for every new observation using Bayes rule. The searcher,
currently positioned at y, selects the next action a which maximally reduces the
entropy H(X) of the belief state. The entropy reduction problem is posed as an
optimal tradeoff between the need to perform exploitative (i.e. greedy) and ex-
ploratory actions, motivated by concepts from reinforcement learning [106]. This
tradeoff is reected in the infotactic utility function [112, Equation (1)], which is a
weighted sum of terms favoring either greedy or exploratory behavior.
D.2 Proof of Equivalence
We show that the infotactic utility is precisely the mutual information between the
current belief state and myopic, future observations.
Proof. Adapted to our notation, the infotactic utility [112, Equation (1)] for the
expected change of entropy after taking action a and observing Z, can be written
114
as follows:
H(X) = b
X
(y
/
)H(X) [1b
X
(y
/
)]

zN
0

z
H(X[Z = z, a). (D.1)
The
z
are the probabilities of detecting z particles, z N
0
, at the next searcher
position y
/
= y+a. For the purposes of this proof, we do not require further details
of the observation model used in [112]. In the framework of infotaxis, the right-
hand side of (D.1) is the convex sum of a greedy term, that is, the expected entropy
reduction upon the imminent detection of the source, and an exploratory term de-
scribing the entropy reduction due to additional particle detections. Let the event
of detecting the source be interpreted as another observation, denoted z = D. The
conditional entropy after detecting the source is H(X[Z = D, a) = 0, eliminating
any uncertainty about the source location. Then
H(X) = b
X
(y
/
)[H(X) H(X[Z = D, a)]
+[1b
X
(y
/
)]

zN
0

z
[H(X) H(X[Z = z, a)] . (D.2)
We dene the augmented set of observations Z =N
0
D, and the corresponding
observation probabilities p
Z
(z[a) as
p
Z
(z[a) =
_
_
_
b
X
(y +a) if z = D,
[1b
X
(y +a)]
z
otherwise.
(D.3)
We can then express (D.2) as
H(X) =

zZ
p
Z
(z[a)[H(X) H(X[z, a)] (D.4)
= H(X) E
Z
[H(X[z, a)]. (D.5)
The RHS of (D.5) is the dening expression of the mutual information [19] be-
tween the belief state X and the expected future observation Z, given the action a,
that is
I(X; Z[a) = H(X) E
Z
[H(X[z, a)]. (D.6)
115
We have thus shown that the infotactic utility given by [112, Equation (1)] is equiv-
alent to the mutual information.
D.3 Conclusion
We conclude that infotaxis is equivalent to search strategies based on a myopic,
mutual information utility. Nonetheless, infotaxis offers an interesting viewpoint
by showing explicitly, that maximizing the mutual information inherently balances
exploitation and exploration in an optimal way, when the overall goal is the reduc-
tion of the entropy of the belief state.
116
Appendix E
Pseudocode
117
Algorithm 1 Pseudocode for a rollout algorithm to compute a heuristic,
approximate expected search time, given the single base policy
base
Require: A, b,
base
, H, W
1: for all a A
y
k
D do
2: ; initialize average search time
3: Q[a] := 0
4: for i := 1 to W do
5: ; simulate system trajectory
6: a := a
7:

b := b
X
8: x b
X
9: y := y
k
+a
10: for n := 1 to H do
11: Q[a] := Q[a] +
1
W
J( a)
12: z O( x, y)
13: if z = 0 then
14: break
15: end if
16:

b := B(

b, a, z)
17: a :=
base
(

b)
18: y := y + a
19: end for
20: if z ,= 0 then
21: ; add heuristic, terminal search time
22: Q[a] := Q[a] +
1
W
J
0
(

b, a)
23: end if
24: end for
25: end for
26: ; select next action
27: a := argmin Q[a]
28: return a
118
Algorithm 2 Pseudocode for a parallel rollout algorithm to compute a
heuristic, approximate expected search time, given the set of base policies
Require: A, b, , H, W
1: for all a A
y
k
D do
2: ; initialize average search time
3: Q[a] := 0
4: for i := 1 to W do
5: for all
base
do
6: ; simulate system trajectory
7: J[
base
] := 0
8: a := a
9:

b := b
X
10: x b
X
11: y := y
k
+a
12: for n := 1 to H do
13: J[
base
] := J[
base
] +
1
W
J( a)
14: z O( x, y)
15: if z = 0 then
16: break
17: end if
18:

b := B(

b, a, z)
19: a :=
base
(

b)
20: y := y + a
21: end for
22: if z ,= 0 then
23: ; add heuristic, terminal search time
24: J[
base
] := J[
base
] +
1
W
J
0
(

b, a)
25: end if
26: end for
27: Q[a] := Q[a] +argmin J[]
28: end for
29: end for
30: ; select next action
31: a := argmin Q[a]
32: return a
119
Algorithm 3 Pseudocode for rollout algorithm to compute the approximate,
nonmyopic mutual information, given the base policy
base
Require: A, b,
base
, H, W
1: for all a A
y
k
D do
2: initialize mutual information
3: I[a] := H(b
X
)
4: for i := 1 to W do
5: ; simulate system trajectory
6: a := a
7:

b := b
X
8: x b
X
9: y := y
k
+a
10: for n := 1 to H do
11: z O( x, y)
12: if z = 0 then
13: break
14: end if
15:

b := B(

b, a, z)
16: a :=
base
(

b)
17: y := y + a
18: end for
19: if z ,= 0 then
20: ; subtract posterior entropy
21: I[a] := I[a]
1
W
H(

b)
22: end if
23: end for
24: end for
25: ; select next action
26: a := argmax I[a]
27: return a
120

Anda mungkin juga menyukai