1 INTRODUCTION
I
NCREASINGLY, several applications require the acquisition
of data fromthe physical world in a reliable and automatic
manner. This necessity implies the emergence of new kinds
of networks, which are typically composed of lowcapacity
devices. Such devices, called sensors, make it possible to
capture and measure specific elements from the physical
world (e.g., temperature, pressure, humidity). Moreover,
they run on small batteries with low energetic capacities.
Consequently, their power consumption must be optimized
in order to ensure increased lifetime for those devices.
During data collection, two mechanisms are used to reduce
energy consumption: message aggregation and filtering of
redundant data. These mechanisms generally use clustering
methods in order to coordinate aggregation and filtering.
Clustering methods belong to either one of two cate
gories: distributed and centralized. The centralized ap
proach assumes that the existence of a particular node is
cognizant of the information pertaining to the other net
work nodes. Then, the problem is modeled as a graph
partitioning problem with particular constraints that render
this problem NPhard. The central node determines clusters
by solving this partitioning problem. However, the major
drawbacks of this category are linked to additional costs
engendered by communicating the network node informa
tion and the time required to solve an optimization
problem. In the second category, the distributed method,
each node executes a distributed clustering algorithm [7],
[14], [15], [16]. The major drawback of this category is that
nodes have limited knowledge pertaining to their neighbor
hood. Hence, clusters are not built in an optimal manner.
In [4], Ghiasi et al. propose centralized clustering for
sensor networks. They model this problem as a /means
clustering problem, which is defined as follows [11]: let 1 be a
set of i data points in ddimensional space 1
d
and an integer
/, and the problem consists of determining a set of / points
in 1
d
, called centers, to minimize the mean squared distance
from each data point to its nearest center. Heinzelman et al.
[7] propose a centralized version of Low Energy Adaptive
Clustering Hierarchy (LEACH), their data collection proto
col, in order to produce better clusters by dispersing cluster
head nodes throughout the network. In this protocol, each
node sends information regarding its current location and
energy level to the sink node, which computes the nodes
mean energy level, and nodes, whose energy level is inferior
to this average, cannot become cluster heads for the current
round. Considering the remaining nodes as possible cluster
heads, the sink node finds clusters using the simulated
annealing algorithm [1] in order to find optimal clusters.
This algorithm attempts to minimize the amount of energy
required for noncluster head nodes to transmit their data to
the cluster head, by minimizing the sum of squared
distances between all noncluster head nodes and the closest
cluster head.
The energy map, the component that holds information
concerning the remaining energy available in all network
areas, can be used to prolong the networks lifetime [7]. In
their probabilistic model for energy consumption,
Heinzelman et al. [7] claim that each sensor node can be
modeled by a Markov chain. They provide an equation that
can be used by each node to calculate its energy dissipation
rate, 1
T
, for the next T time steps. With the remaining
energy, the value 1
T
can be sent to the sink node for energy
map building purposes.
This paper proposes a new centralized clustering
mechanism equipped with energy maps and constrained
by QualityofService (QoS) requirements. Such a clustering
IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009 433
. The authors are with the Mobile Computing and Networking Research
Laboratory (LARIM), Department of Computer Engineering,
Ecole
Polytechnique de Montreal, C.P. 6079, Station succ. Centreville,
Montreal, QC H3C 3A7, Canada.
Email: {abdelmorhit.elrhazi, samuel.pierre}@polymtl.ca.
Manuscript received 13 Aug. 2007; revised 9 Apr. 2008; accepted 20 Aug.
2008; published online 4 Sept. 2008.
For information on obtaining reprints of this article, please send email to:
tmc@computer.org, and reference IEEECS Log Number TMC2007080243.
Digital Object Identifier no. 10.1109/TMC.2008.125.
15361233/09/$25.00 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS
mechanism is used to collect data in sensor networks. The
first original aspect of this investigation consists of adding
these constraints to the clustering mechanism that helps the
data collection algorithm in order to reduce energy con
sumption and provide applications with the information
required without burdening them with unnecessary data.
Centralized clustering is modeled as hypergraph partition
ing. The novel method proposes the use of a tabu search
heuristic to solve this problem. The existing centralized
clustering methods cannot be used to solve this issue due to
the fact that our approach to model the problem assumes
that the numbers of clusters and cluster heads are unknown
before clusters are created, which constitutes another major
original facet of this paper.
The remainder of this paper is organized as follows:
Section 2 summarizes the data collection mechanism.
Section 3 outlines the problem formula. Section 4 describes
the tabu search adaptation. Computational experiments and
results are reported in Section 5. Section 6 concludes this
paper and delineates some of the remaining challenges.
2 DATA COLLECTION MECHANISM
Generally, sensor networks contain a large quantity of
nodes that collect measurements before sending them to the
applications. If all nodes forwarded their measurements,
the volume of data received by the applications would
increase exponentially, rendering data processing a tedious
task. A sensor system should thus contain mechanisms that
allow the applications to express their requirements in
terms of the required quality of data. Data aggregation and
data filtering are two methods that reduce the quantity of
data received by applications. The aim of those two
methods is not only to minimize the energy consumption
by decreasing the number of messages exchanged in the
network but also to provide the applications with the
needed data without needlessly overloading them with
exorbitant quantities of messages.
The aggregation data mechanism allows for the gathering
of several measures into one record whose size is less than
the extent of the initial records. However, the result
semantics must not contradict the initial record semantics.
Moreover, it must not lose the meanings of the initial
records. The data filtering mechanism makes it possible to
ignore measurements considered redundant or those irrele
vant to the application needs. A sensor system provides the
applications with the means to express the criteria used to
determine measurement relevancy, e.g., an application
could be concerned with temperatures, which are 1) lower
than a given value and 2) recorded within a delimited zone.
The sensor system filters the network messages and
forwards only those that respect the filter conditions.
Applications that use sensor networks are generally
concerned with the node measurements within a certain
period of time. Hence, the most important key indicators in
sensor networks are the quality of the measurements and
the network lifetime. An application designed to record the
mean temperature in zones where the sensors are deployed
could be associated with a set of requirements in terms of
measured frequencies (e.g., the sensor system must record a
measurement every 15 minutes), in terms of a measurement
discrepancy thresholds (e.g., the sensor system must ignore
data whose result is less than 10 percent of the previous
value), and in terms of the sensor lifetime (e.g., measure
ments must be provided for one year).
In [3], we propose a novel data collection approach for
sensor networks that use energy maps and QoS require
ments to reduce power consumption while increasing
network coverage. The mechanism comprises two phases:
during the first phase, the applications specify their QoS
requirements regarding the data required by the applica
tions. They send their requests to a particular node o, called
the collector node, which receives the application query and
obtains results from other nodes before returning them to
the applications. The collector node builds the clusters,
optimally using the QoS requirements and the energy map
information. During the second phase, the cluster heads
must provide the collector node with combined measure
ments for each period. The cluster head is in charge of
various activities: coordinating the data collection within its
cluster, filtering redundant measurements, computing
aggregate functions, and sending results to a node collector.
3 PROBLEM FORMULATION
The considered network contains a set \ of i stationary
nodes whose localizations are known. The communication
model can be described as multihop, which means that
certain nodes cannot send measurements directly to the
collector node: they must rely on their neighbors service.
An application can specify the following QoS requirements:
1. Data collection frequency, ). The network provides
results to the application every time the duration )
expires.
2. A measurement uncertainty threshold, mut. If the
difference between two simultaneous measurements
from two different nodes in the same zone (fourth
requirement) is inferior to mut, then one of them is
considered redundant.
3. A query duration, T. The network required for the
query run a total time whose value is equal to T.
4. A zone size step. The step value determines the zone
length. Within a single zone, measurements are
considered redundant. If an application requires
more precision, it could decrease the step value or
even ignore the transfer of such value.
The goal of the clustering algorithm is to 1) split the
network nodes into a set of clusters G
i
that satisfies the
application requirements, 2) reduce energy consumption,
and 3) prolong the network lifetime. Clusters are built
according to the following criteria:
. Maximize network coverage using the energy map;
. Gather nodes likelytoholdredundant measurements;
. Gather nodes located within the same zone delim
ited by the application.
Based on those criteria, a cluster building problem (CBP)
in the remainder of this paper consists of determining the
set G
i
that fulfills the following conditions:
434 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009
1. [
i
G
i
\ ^ \
i
G
i
c. 1
2. 8`
,
. `

2 G
i
`
,
t `

t
int. 2
3. 8`
,
. `

2 G
i
d
,.
:tcj. 3
4. 9`
,
2 G
i
1
T
,
1
icioiiiiq
,
. 4
Here, G
i
consists of a cluster that contains a set of nodes
and a particular node that represents the cluster head. `
,
t
represents `
,
node reading during the time slot t; d
,.
corresponds to the distance between nodes `
,
and `

; 1
T
,
is
equal to the estimated energy dissipation of node `
,
during
period T; 1
icioiiiiq
,
illustrates the energy remaining in node
`
,
when the cluster building algorithm starts running.
The first condition makes it possible to structure nodes
into disjoined sets. The second condition permits the
gathering of nodes that are likely to record redundant
measurements that will be filtered by the network nodes.
This condition can be verified using a current node measure
ment or previously taken mean measurements. In this case,
the sensor node must be able to store measurement means.
The thirdconditioncompels the gathering of nodes locatedin
zones whose lengths are determined by the step value that is
generated by the application. The fourth condition ensures
that eachcluster contains at least one node that guarantees the
coverage of the entire zone during the query run time.
CBP is considered as a hypergraph partitioning problem.
The network nodes are modeled on a hypergraph with a
vertex set \ f1. . . . . ig. The arcs belong to a set of
clusters G fG
1
. . . . . G
i
g. Additionally, let us define
. c
i
: the cost of cluster G
i
;
. d
,.:
: the distance between node , and the collector
node :;
. o
i.,
1: if cluster G
,
contains node i, 0 otherwise;
. c
,
1: if the energy level of node , satisfies
1
T
,
1
icioiiiiq
,
, 0 otherwise.
The binary decision variables are given as follows:
. r
,
1 if cluster , is used for partitioning, 0 otherwise.
Hence, the CBP formulation can be expressed as
minimize
X
i
,1
c
,
r
,
!
5
subject to
X
i
,1
o
i,
r
,
1. for i 1. . . . . i. 6
r
,
2 f0. 1g. for , 1. . . . . i. 7
`
/
t `
c
t j j int. 8/. c 2 G
,
. for , 1. . . . . i. 8
d
/.c
step. 8/. c 2 G
,
. for , 1. . . . . i. 9
X
i
i1
o
i,
c
i
! 1. for , 1. . . . . i. 10
Equation (6) ensures that each node is included in a
single cluster; this is called a partitioning constraint.
Equations (8), (9), and (10) represent the cluster building
criteria (2), (3), and (4), respectively.
The objective of a cluster building phase is to minimize
energy dissipation when collecting node data. The cost c
,
of
a cluster G
,
should reflect this objective. The cost should be
composed of two major terms. The first one represents the
energy consumption due to the cluster head duties. Indeed,
it is responsible for data collection and aggregation, as well
as the transmission of the measures of its cluster. The
second term represents the energy gained due to the fact
that messages of other cluster nodes will be filtered by the
data collection mechanism.
Generally, the energy consumed by a node for a single
cycle is expressed as follows [8]:
1
cycc
1
1
1
o
1
T
1
1
. 11
where 1
1
, 1
o
, 1
T
, and 1
1
represent the energy required for
data processing, sensing, transmitting, and receiving per
cycle time, respectively. The quantity of energy spent for
each operation depends on the network and the event
model. Roughly, (11) is approximated by the preponderant
term that is 1
T
[9]. When sending 1 bit from node n to ., the
energy consumed is expressed as follows [2]:
1
n.
1
trccc
oij
d
c
n.
c ! 2.0. 12
Here, factor c indicates the path loss exponent and relies
on the communication channel as well as environmental
conditions. 1
trcc
depicts the energy dissipated by the
electronic transmitter, and
amp
denotes a constant parameter
that characterizes the transmitter amplification.
The communication model considered consists of a
multihop model. Consequently, to express the energy
consumed by communicating node i measurements to sink
node :, the equation must consider the communication
links between each pair of nodes found on the path between
i and :. However, such a complex equation would be
difficult to compute as it depends on the routing protocol.
For this reason, only the distance that separates nodes i and
: is considered. This simplification is justified by the fact
that, in a dense network, routing protocols can find a path
between i and :, which is similar to a line joining i and : on
the one hand. On the other hand, this simplification is also
valid in singlehop communication models.
Consequently, the cost c
,
of cluster G
,
is expressed as
follows:
c
,
cd
c
,.:
u
X
i,
i1
d
c
i.:
. 13
where i
,
indicates the number of nodes included in
cluster G
,
, which is not currently a cluster head. d
,.:
represents the distance between the cluster head G
,
and
sink node; c and u reflect two positive coefficients. The first
term of (13) represents the estimated energy consumed by
the cluster head to communicate the collective measure
ments. The second term represents the total energy saved
by cluster G
,
nodes by filtering their messages.
The problem thus consists of minimizing
P
i
,1
cd
c
,.:
u
P
i
,
i1
d
c
i.:
r
,
given (6)(10). When formulated that way, the
problem is considered NPhard since it is modeled as a
partitioning problem with additional constraints and it is
known that the partitioning problem is NPhard [6].
Consequently, CBP cannot be resolved by a polynomial
EL RHAZI AND PIERRE: A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS 435
algorithm. This explains why a tabu search heuristic was
adopted in order to find the best solution.
4 A TABU SEARCH APPROACH
In order to facilitate the usage of tabu search for CBP, a new
graph called G
i
is defined. It is capable of determining
feasible clusters. A feasible cluster consists of a set of nodes
that fulfill the cluster building constraints (8), (9), and (10).
Nodes that satisfy Constraint (10), i.e., ensure zone cover
age, are called active nodes. The vertices of G
i
represent the
network nodes. An edge i. , is defined in graph G
i
between nodes i and , if they satisfy Constraints (8) and (9).
Consequently, it is clear that a clique in G
i
embodies a
feasible cluster. A clique consists of a set of nodes that are
adjacent to one another.
Five steps should be conducted in order to adapt tabu
search heuristics to solve a particular problem:
1. design an algorithm that returns an initial solution,
2. define moves i that determine the neighborhood
`: of a solution :,
3. determine the content and size of tabu lists,
4. define the aspiration criteria,
5. design intensification and diversification
mechanisms.
The algorithm ends when one of the following three
conditions occurs:
1. All possible moves are prohibited by the tabu lists;
2. The maximal number of iterations allowed has been
reached;
3. The maximal number of iterations, where the best
solution is not enhanced successively, has been
reached.
4.1 Initial Solution
The goal is to find an appropriate initial solution for the
problem, in order to get the best solution from tabu search
iterations within a reasonable delay. The algorithm depicted
in Fig. 1 is proposed. It starts sorting active nodes according
to their degree in graph G
i
decreasingly. For each iteration,
the first active node i, not yet covered by the initial solution
1
0
, is selected. The algorithm determines the largest size
clique that contains the selected active node i with its
adjacent nodes in graph G
i
, which have yet to be covered by
1
0
. This clique is considered a new cluster and node i
becomes the cluster head.
The algorithm does not ensure that all nonactive nodes
are assigned to a cluster. Consequently, if node i is not
covered by any cluster when the algorithm ends, it is
assigned to a cluster whose head is adjacent to node i.
However, this leads to the fact that an initial solution could
not be feasible, i.e., nodes made up of at least one cluster does
not consist of a clique in the graph G
i
. A penalty equation to
evaluate a solution is proposed in the following sections.
4.2 The Neighborhood `: Definition
The definition of the neighborhood `: of a solution : is a
crucial step as it determines the final quality of the solution
and has a direct impact on the execution time. Two types of
moves are distinguished: the first move involves an
ordinary node, i.e., a nonactive node, and the second move
involves an active node. This is due to the fact that an active
node could be a cluster head and thus build a new cluster.
Furthermore, the third move that involves a cluster head
and allows removing an existing cluster from a solution is
also considered.
1. A Move Involving a Regular Node. Let : represent
the solution analyzed for each iteration. Solution :
consists of a set of clusters. Let o be a regular node.
The first type of moves i:. o is defined as follows:
assume that node o is assigned to cluster G
i
in :.
Move i:. o assigns node o to another cluster
G
,
G
i
6 G
,
whose head is adjacent to node o and
removes it from cluster G
i
.
2. A Move Involving an Active Node. This second
type of move relies on the fact that an active node
could be a cluster head. Let o represent an active
node. Move i:. o consists of
a. Reassigning node o to a cluster whose head is
adjacent to o. This move is similar to those of the
first type, since an active node could be included
in a cluster without becoming its head.
b. Select node o to become the head of the cluster to
which it is assigned. The previous cluster head
will still be assigned to this cluster, although it is
not the cluster head. Consequently, the cost of
this cluster will be affected as it relies on the
head coordinates.
3. A Move Involving a Cluster Head. The third type of
move involves the head of an existing cluster. Let o
be a head of cluster G
i
, which is empty, i.e., G
i
contains only its head o. Move i:. o consists of
removing cluster G
i
from solution : and assigning
node o to a cluster whose head is adjacent to o.
These three types of moves engender a variety of
solutions by producing several combinations of clusters
and cluster heads. Also, they create solutions whose sizes
vary. However, they can produce clusters that do not
necessarily consist of a clique since node o is reassigned
without verifying whether the resulting cluster is a clique
or not. A cluster penalty is defined and added to the
cluster cost in order to compare `: solutions. Node o
436 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009
Fig. 1. Initial solution algorithm.
penalty 1
o
assigned to cluster G
i
represents the number of
nodes in G
i
, which is not adjacent to node o. Conse
quently, function )
0
makes it possible to compare the
elements of neighborhood `:. It is expressed as follows:
)
0
i:. c
X
j:j
,1
c
,
r
,
o1
c
. 14
In this formula, o, called the penalty coefficient, happens
to be positive. More details will be provided further in this
paper. j:j represents the number of clusters in solution :.
Function )
0
i:. o contains two parts: the first is equal to
the total of cluster costs and the second term represents the
penalty caused by the move. For each iteration, this function
is used to compare `: solutions. To perform this
evaluation quickly, it suffices to calculate the gain of a
solution in `: compared to solution : without recalculat
ing the value of )
0
for each iteration. This is possible since
move i:. o affects only a maximum of two clusters.
Hence, we define the gain Goiii:. o associated with
move i:. o as follows:
1. If move i:. o consists of reassigning node o from
cluster G
i
to cluster G
,
, the gain is
Goii i:. o o1
o,
1
oi
. 15
1
o,
represents the penalty for assigning node o to
cluster G
,
. The cluster costs disappear from (15) as
they will be mutually neutralized.
2. If move i:. i consists of selecting node i as the
head of the cluster G
,
, where it is assigned, the gain
is expressed as follows:
Goii i:. i cd
c
i.:
ud
c
t.:
cd
c
t.:
ud
c
i.:
. 16
Node t denotes the current head of cluster G
,
. The
result is obvious if we consider (13) and (14). The
penalty values are identical as node i does not
change clusters.
4.3 Tabu List and Aspiration Criteria
Occasionally, tabu search methods accept solutions that do
not improve the objective function, in the hope of reaching
improved solutions in the future. However, accepting
solutions that are not necessarily optimal introduces a cycle
risk, i.e., a return to previously considered solutions, hence
the idea of keeping a tabu list, to keep track of the solutions
that have been considered in the past. Thus, when
generating the neighborhood candidates, the solutions that
appear in the tabu list are removed. Our adaptation
proposes two tabu lists: a reassignment list and a reelection
list. The first tabu list prevents cycles that can be generated
by the reassigning of a node to the same cluster. After each
move i:. o, which consists of reassigning node o to
cluster G
i
, the pair (o, head of G
i
) is added to this tabu list.
The second tabu list prevents the reelection of an active
node in the same cluster. After a move i:. o, consisting of
electing node o in cluster G
i
, two pairs of nodes are added
to the reelection list: the first pair (o, head of G
i
) prohibits
the move i:. o and the second pair (head of G
i
, o)
prevents the reverse move. The reassignment of the tabu list
is initialized by the pairs that represent the initial solution
before starting the iterations. Such a strategy prevents the
return to the initial solution.
Using a tabu list could drastically restrict the neighbor
hood `:. Moreover, it could miss certain attractive
solutions. Consequently, tabu search methods allow the
violation of tabu list rules through the definition of an
aspiration criterion. Our proposal opts for the most used
aspiration criterion, which consists of considering a move
inventoried in the tabu list, which in turn, engenders a
solution that is superior to the best solution found in the
first place.
4.4 Diversification and Intensification
Diversification and Intensification are two mechanisms that
make it possible to improve tabu search methods. They start
by analyzing the appropriate solutions visited and obtain
their common properties in order to be able to intensify the
search in another neighborhood or to diversify the searches.
In tabu searches, this mechanism is called longterm memory.
Our proposal uses a technique calledthe shifting penalty tactic,
which is an instance of a procedure called strategic oscillation,
representing one of the basic diversification approaches for
tabu searches [5]. The approach consists of directing the
search toward and away from selected boundaries of
feasibility, either by manipulating the objective function
(e.g., with penalties or incentives, as the case may be) or
simplyby compellingthe choice of move that leads the search
in specified directions. In this particular case, the penalty
coefficient is determined dynamically, according to the
violation of the clique constraint. Thus, if the solution
selected at the end of an iteration contains a cluster with
nonnull penalties, i.e., the cluster does not consist of a clique,
the penalty coefficient value is increased and vice versa.
5 COMPUTATIONAL EXPERIENCE AND RESULTS
In order to evaluate the performance of these novel
algorithms, they were implemented with the use of C++
and the Boost Graph Library (BGL) [10] and tested with
sensor networks of different sizes and topologies. On the
one hand, several experiments were conducted to evaluate
the impact of the tabu search method parameters. On the
other hand, another approach was devised to solve the
partitioning problem based on CPLEX [18] in order to
compare the quality of the solutions found by tabu search.
This new method had to be designed and implemented as
the existing approach for clustering problems cannot be
used in this case, as explained at the beginning of this
paper. The algorithms run on a 1.80GHz Pentium 4,
equipped with a Linux server (Red Hat 3.4.42), a 1Gbyte
memory, and an Intel processor.
5.1 Analysis of the Impact of Tabu Search
Parameters
The size of the tabu list has a direct impact on the quality of
the solution. Hence, it is important to analyze its impact, in
order to adjust its value accordingly. Results reported in [5]
show that determining the tabu list size dynamically is
more efficient than fixing its value during the iterations.
This experiment involves a sensor network composed of
EL RHAZI AND PIERRE: A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS 437
100 nodes. A square topology is used, i.e., nodes arise on the
summits of squares that cover the entire network area. To
facilitate the analysis, it is assumed that all nodes are active.
Only the size of the tabu list varies, and the values of all
other parameters are set, e.g., the maximum number of
iterations allowed is set to 1,000. Fig. 2 illustrates the results
of the investigation.
Results corroborate with those described in [5]. Indeed,
the best results are obtained using tabu lists whose size is
similar to the number of nodes, i.e., between 75 and 120.
Results associated with a tabu list of large values hinder the
quality of the solution as they reduce the number of
elements visited by interdicting additional moves. More
over, low values also hinder quality due to the cycle
generation. Thus, we determine that it is best to set the size
of the tabu list dynamically within the interval 0.75`. 1.1`
(` indicates the number of nodes).
A second experiment was conducted in order to quantify
the impact of the parameter maximum number of allowed
iterations on the quality of the solution found and
determine the limits of the solution enhancement. A sensor
network that comprises 1,000 nodes localized in a square
topology is considered. The maximum number of iterations
is modified and the other parameters are set except for the
maximum number of iterations where the best solution is
not enhanced parameter, which takes a value that allows
the algorithm to reach the maximum number of iterations.
Fig. 3 shows the impact of this parameter on the percentage
of the improvement of the initial solution cost.
438 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009
Fig. 2. Impact of the tabu list size on the solution cost.
Fig. 3. Impact of the maximum number of iterations on the solution cost.
Results show that by increasing the maximum number of
iterations, the solution costs is enhanced; hence, higher
quality solutions are obtained. This occurs until this
parameter reaches a value where solutions have almost
identical cost in spite of the increasing number of iterations.
Indeed, between 0 and 5,000 iterations, the solution quality
is enhanced by a factor of 2.3. However, between 5,000 and
10,000 iterations, such enhancement is evident only
1.35 times. This is mainly due to the influence of the other
parameters and the initial solutions algorithm, which
yields a satisfactory solution.
The analyses regarding the impact of the maximal
number of iterations on the execution time and the quantity
of moves yield the same results. Fig. 4 shows such an impact.
Within the range of 800 to 5,000 iterations, the execution time
increases 10fold and the quality of the solution is enhanced
efficiently, i.e., solution costs are reduced by 40 percent.
Thus, we conclude that the quality of the solution is
enhanced by increasing the maximum number of iterations
until a value is reached where the execution time increases
without enhancing the solution efficiently. This value, which
we will be using in the following experiments, is equal to
approximately eight times `.
5.2 Comparing Tabu Search Approach with
CPLEXBased Method
As stated previously, none of the analyzed clustering
mechanisms can be compared with our approach. Conse
quently, we devised and implemented a second method to
solve the CBP clustering problem. The objective is to
compare the quality of the solution found with the first
approach. This new method is based on CPLEX to solve the
problem. CPLEX could not be used to directly solve the
partitioning problem, since the number of feasible clusters
is quite significant and CBP contains saturated constraints.
For these reasons, this new method finds an optimal
covering of the hypergraph and uses this covering to find
the best partitioning of the hypergraph. Indeed, the method
contains three steps. In the first step, a set of feasible
clusters is defined, using the graph G
i
. The second step,
called the optimization phase, makes it possible to return
optimal covering of the hypergraph using CPLEX. The last
step, called the postoptimization phase, aims to use the results
of the previous step to find hypergraph partitioning. The
experiments run on the server described in the previous
section, using CPLEX 10.1 and considers square topology
sensor networks in which all nodes are active and QoS
parameters are set (i.e., the values of step and mut). The
` number of nodes varies and the maximum number of
iterations is set at 8
`.
Fig. 5 illustrates the solution costs when varying the
number of nodes for both methods (tabu search and
CPLEX). It reveals that the solutions found by these two
methods are rather similar except for cases where the
number of nodes is superior to 700 nodes, at which point the
tabu search solutions generate better results. This is due to
the use of the postoptimization phase in the second method,
which does not necessarily return an optimal solution. Also,
the first step limits the considered solution sets.
Fig. 6 illustrates the execution time as the number of
nodes for both methods varies. It also reveals that the
execution times of both methods are highly similar for
networks of fewer than 700 nodes. However, for this size,
the execution time of the method based on CPLEX increases
significantly and the execution time of the tabu search
increases reasonably. This is mainly due to the fact that
simplex, used by CPLEX in this case, consists of an
exponential method.
Consequently, the results of this experiment show that,
on the one hand, the solutions returned by our method
based on tabu search are associated with high quality in
terms of cluster costs and execution time. On the other
hand, this approach behaves well with the network
extensibility, i.e., the execution time increases in a satisfac
tory manner as network size augments.
The aforementioned experiments were conducted using
a square topology. In order to validate our results with
other topologies, the same study was conducted using a
EL RHAZI AND PIERRE: A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS 439
Fig. 4. Impact of the maximum number of iterations on the execution time.
random topology where nodes are randomly localized over
a 1km
2
surface. The same parameter values as in square
topology are used, e.g., the same value of step and mut, and
the maximum number of iterations allowed is set at 8
`.
Node localization is distributed in a uniform manner.
Random reel numbers are generated using the Boost library
[17]. Node density varies from 20 to 900 nodes/km
2
.
Fig. 7 shows solution costs when varying the node
density for the method based on tabu search and the
method based on CPLEX. Results are similar to those
obtained with the square topology. Indeed, the solution
costs for both methods are almost identical. Analyzing the
execution times of both methods reveals the same conclu
sion. Fig. 8 illustrates the execution time as the node density
varies for both methods. Results show that the execution
time increases slightly for the tabu searchbased method,
contrary to the CPLEXbased method, where the execution
time augments drastically. This is justified with the same
reasons as the square topology.
Such investigations allow for the validation of the
results obtained using square topologies. On the one
hand, the solutions returned by the method based on tabu
search provide high quality in terms of cluster cost and
execution time. On the other hand, the method based on
tabu search behaves well with network extensibility. This
also applies to networks that support square or random
topologies.
5.3 Comparing Centralized Approach, Distributed
Approach, and TAG
We devise the centralized approach for the CBP because we
noticed from our previous experiences described in [3] that
the distributed approach does not ensure an optimal
solution. We conducted other experiences in order to
evaluate the difference between the performances of the
centralized and distributed approaches.
Fig. 9 depicts solution costs when varying the network
size for the centralized and distributed approaches. Results
440 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009
Fig. 6. Comparing execution time (tabu search and CPLEX): Square topology.
Fig. 5. Comparing the solution costs (tabu search and CPLEX): Square topology.
show that effectively the costs of the clusters built using the
centralized approach are better than those of the clusters
built using the distributed approach. Also, the difference
between the two costs becomes bigger when the network
size increases. For example, the distributed approach allows
increasing the quality of the clusters by 500 percent in a
network with 100 nodes. Consequently, the clusters built by
the centralized approach are better than those created by
the distributed approach. The reason that explains this
result is that in the distributed approach the nodes have
information limited to their node in their neighbor, whereas
the centralized approach finds the better cluster among all
the possible combination.
In order to the build the clusters, the two approaches
need to communicate messages between the network
nodes. Indeed, on the one hand, in the distributed
approach the nodes have to send the message that allows
creating the clusters (e.g., informing the other nodes with
regard to the QoS required by the applications); on the
other hand, the nodes in the centralized approach have to
send their information to a central node that collects all of
these information and runs the algorithm to build the
clusters. The energy consumed by sending and receiving
these messages should not be neglected. Consequently, we
conducted experiences to measure the energy consumed by
the two approaches in order to build the clusters.
In the centralized approach, the central node needs the
following information in order to run the algorithmof cluster
building: 1) the coordinates of each node in order to build the
graph G
i
and calculate the cluster costs, 2) the measurement
of each node in order to build the graph G
i
, and 3) the active
flag of each node (i.e., the flag has to indicate whether
the node is able to cover its zone or not).
We conducted several simulations using OMNET++ [19].
We use the multihop network communication model.
Fig. 10 shows the energy consumed to build the clusters
by the centralized and distributed approaches. Results
show that the distributed approach needs less energy
EL RHAZI AND PIERRE: A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS 441
Fig. 7. Comparing solution costs (tabu search and CPLEX): Random topology.
Fig. 8. Comparing execution time (tabu search and CPLEX): Square topology.
consumption than the centralized approach and the gap
between these energies becomes bigger when the network
size increases. The reason behind this result is that the
central node needs to generate a considerable number of
messages in order to collect all the node information. We
conclude that the central approach is less efficient than the
distributed approach in the cluster building phase.
We conducted other simulations in order to compare
the total consumed energy (i.e., energy consumed by
building algorithms and the energy consumed during the
data collection phase) for the central and distributed
approaches. The results of our approaches are compared
to those generated by an existing data collection algorithm,
called TAG [13]. This algorithm was chosen due to its
importance in sensor networks. It is actually used for data
collection in TinyOS, which is the most common operating
system in sensor networks. Fig. 11 illustrates the total
energy consumption as network sizes vary. For the three
algorithms, energy consumption increases proportionally
to the number of nodes, a natural phenomenon. However,
the slope of the curve that represents our approaches is
much less steep than the one associated with the TAG
algorithm, which means that the node number increase has
less impact on our approach as compared to TAG. This is
due to the fact that our system can filter data sensed by
nodes within the same zone much better than in TAG. The
second conclusion is that, when the data collection phase is
considered, the performance of the central approach is
better than the distributed approach due to the fact that
the cluster building is more efficient in the central
approach.
5.4 Comparing Tabu SearchBased to Simulated
AnnealingBased Approaches
Simulated annealing is a probabilistic algorithmic approach
to solve optimization problems. Kirkpatrick et al. [12] use it
to solve combinatorial optimization problems. Simulated
annealing allows for a given optimization problem to
accept solutions that degrade cost; even if later, such
accepted solutions will be ignored when they fail to
improve the best solution. Simulated annealing decides
whether to reject or accept a solution that degrades costs
442 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009
Fig. 9. Comparing solution cost (distributed and centralized approaches).
Fig. 10. Comparing the energy consumption to build the clusters (centralized and distributed approaches).
randomly. The same query and topology were used to
compare the results obtained by tabu search and by
simulated annealing. Fig. 12 presents a comparison with
simulated annealing. Generally, the tabu search algorithm
performs better than the simulated annealing algorithm.
6 CONCLUSION
This paper has presented a heuristic approach based on a
tabu search to solve clustering problems where the numbers
of clusters and cluster heads are unknown beforehand. To
our knowledge, this is the first time that the clustering
problem is modeled and resolved with these constraints.
The tabu search adaptation consists of defining three types
of moves that allow reassigning nodes to clusters, selecting
cluster heads, and removing existing clusters. Such moves
use the largest size clique in a feasibility cluster graph,
which facilitates the analysis of several solutions and makes
it possible to compare them using a gain function.
The performance of this novel approach was evaluated
with different network sizes and topologies. The perfor
mance is compared to that obtained by a second resolution
CPLEXbased method, a third approach based on simulated
annealing heuristic, and an existing algorithm (TAG).
Finally, results show that a tabu searchbased resolution
method provides quality solutions in terms of cluster cost
and execution time. Furthermore, it behaves well with
network extensibility. Nevertheless, compared to a distrib
uted approach, this centralized approach suffers from a
major drawback linked to the additional costs generated by
communicating the network node information and the time
required to solve an optimization problem.
EL RHAZI AND PIERRE: A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS 443
Fig. 11. Comparing the energy consumption (centralized, distributed, and TAG approaches).
Fig. 12. Comparing the solution cost (tabu search and simulated annealing).
We conducted several experiences to compare the
performance of our central approach with those of a
distributed approach and we conclude that the central
approach is less efficient than the distributed approach in
the cluster building phase. Nevertheless, the central
approach is more efficient in the data collection phase.
Consequently, the central approach is more efficient in a
case where the data collection phase is long. Otherwise, the
distributed approach should be used to run the queries with
a short execution time.
REFERENCES
[1] P.K. Agarwal and C.M. Procopiuc, Exact and Approximation
Algorithms for Clustering, Algorithmica, vol. 33, no. 2, pp. 201
226, June 2002.
[2] P. Basu and J. Redi, Effect of Overhearing Transmissions on
Energy Efficiency in Dense Sensor Networks, Proc. Third Intl
Symp. Information Processing in Sensor Networks (IPSN 04), pp. 196
204, Apr. 2004.
[3] A. El Rhazi and S. Pierre, A Data Collection Algorithm Using
Energy Maps in Sensor Networks, Proc. Third IEEE Intl Conf.
Wireless and Mobile Computing, Networking, and Comm. (WiMob 07),
2007.
[4] S. Ghiasi, A. Srivastava, X. Yang, and M. Sarrafzadeh, Optimal
Energy Aware Clustering in Sensor Networks, Sensors, pp. 258
269, 2002.
[5] F. Glover, E. Taillard, and D. Werra, A Users Guide to Tabu
Search, Annals of Operations Research, vol. 41, no. 14, pp. 328, May
1993.
[6] M. Gondran and M. Minoux, Graphes et Algorithmes, second ed.
Editions Eyrolles, 1985.
[7] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, An
Application Specific Protocol Architecture for Wireless Micro
sensor Networks, IEEE Trans. Wireless Comm., vol. 1, no. 4,
pp. 660670, Oct. 2002.
[8] J.J. Lee, B. Krishnamachari, and C.C.J. Kuo, Impact of Hetero
geneous Deployment on Lifetime Sensing Coverage in Sensor
Networks, Proc. IEEE Sensor and Ad Hoc Comm. and Networks Conf.
(SECON 04), pp. 367376, 2004.
[9] W. Liang and Y. Liu, Online Data Gathering for Maximizing
Network Lifetime in Sensor Networks, IEEE Trans. Mobile
Computing, vol. 6, no. 1, pp. 211, Jan. 2007.
[10] S. Jeremy, The Boost Graph Library: User Guide and Reference Manual.
AddisonWesley, 2002.
[11] T. Kanugo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silver
man, and A.Y. Wu, A Local Search Approximation Algorithm for
/Means Clustering, Proc. 18th Ann. ACM Symp. Computational
Geometry (SoCG 02), pp. 1018, 2002.
[12] S. Kirkpatrick, C.C. Gelatt Jr., and M.P. Vecchi, Optimization by
Simulated Annealing, Science, vol. 220, pp. 671680, 1983.
[13] S.R. Madden, M.J. Franklin, J.M. Hellerstein, and W. Hong, TAG:
Tiny Aggregation Service for AdHoc Sensor Networks, Proc.
Fifth Symp. Operating Systems Design and Implementation (OSDI 02),
pp. 131146, 2002.
[14] O. Moussaoui, A. Ksentini, M. Naimi, and M. Gueroui, A Novel
Clustering Algorithm for Efficient Energy Saving in Wireless
Sensor Networks, Proc. Seventh Intl Symp. Computer Networks
(ISCN 06), pp. 6672, 2006.
[15] S. Raghuwanshi and A. Mishra, A SelfAdaptive Clustering
Based Algorithm for Increased EnergyEfficiency and Scalability
in Wireless Sensor Networks, Proc. IEEE 58th Vehicular Technology
Conf. (VTC 03), vol. 5, pp. 29212925, 2003.
[16] O. Younis and S. Fahmy, Distributed Clustering in AdHoc
Sensor Networks: A Hybrid, EnergyEfficient Approach, Proc.
IEEE INFOCOM, pp. 629640, 2004.
[17] http://boost.org/libs/random/index.html, Feb. 2007.
[18] http://www.ilog.com/products/cplex/, Feb. 2008.
[19] http://www.omnetpp.org/, Feb. 2008.
Abdelmorhit El Rhazi received the bachelors
degree in software engineering from the
Ecole
Mohammadia dInge nieur, Rabat, Morocco, in
1995 and the masters degree in computer
engineering from the
Ecole Polytechnique de
Montre al in 2003. He is currently with the Mobile
Computing and Networking Research Labora
tory (LARIM), Department of Computer Engi
neering,
Ecole Polytechnique de Montre al. His
work revolved around data collection and the
energy consumption in wireless sensor networks.
Samuel Pierre is currently a professor of
computer engineering at
Ecole Polytechnique
de Montre al, where he is the director of the
Mobile Computing and Networking Research
Laboratory (LARIM) and an NSERC/Ericsson
industrial research chair in nextgeneration
mobile networking systems. He is the author or
coauthor of six books, 15 book chapters,
16 edited books, and more than 350 other
technical publications including journal and
proceedings papers. He received the Best Paper Award from the Ninth
International Workshop in Expert Systems and Their Applications
(France, 1989), the Distinguished Paper Award from OPNETWORK
2003 (Washington), a special mention from Telecoms Magazine
(France, 1994) for one of his coauthored books, Telecommunications
et Transmission de Donnees (Eyrolles, 1992), among others. His
research interests include wireline and wireless networks, mobile
computing, performance evaluation, artificial intelligence, and electronic
learning. He is an associate editor of the IEEE Communications Letters,
the IEEE Canadian Journal of Electrical and Computer Engineering, and
the IEEE Canadian Review. He is also a regional editor of the Journal of
Computer Science. He also serves on the editorial board of Telematics
and Informatics (Elsevier Science) and the International Journal of
Technologies in Higher Education (IJTHE). He is a fellow of the
Engineering Institute of Canada, a member of the Canadian Academy of
Engineering, and a senior member of the IEEE.
> For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
444 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 4, APRIL 2009