Department of Information Systems and Business Analytics, Deakin University, Burwood, VIC, Australia
Department of Computer Science, Aberystwyth University, Wales, UK
c
The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
d
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
b
art ic l e i nf o
a b s t r a c t
Article history:
Received 12 September 2014
Received in revised form
30 April 2015
Accepted 1 June 2015
The conventional bee colony optimization (BCO) algorithm, one of the recent swarm intelligence (SI)
methods, is good at exploration whilst being weak at exploitation. In order to improve the exploitation
power of BCO, in this paper we introduce a novel algorithm, dubbed as weighted BCO (wBCO), that
allows the bees to search in the solution space deliberately while considering policies to share the
attained information about the food sources heuristically. For this purpose, wBCO considers global and
local weights for each food source, where the former is the rate of popularity of a given food source in
the swarm and the latter is the relevancy of a food source to a category label. To preserve diversity in the
population, we embedded new policies in the recruiter selection stage to ensure that uncommitted bees
follow the most similar committed ones. Thus, the local food source weighting and recruiter selection
strategies make the algorithm suitable for discrete optimization problems. To demonstrate the utility of
wBCO, the feature selection (FS) problem is modeled as a discrete optimization task, and has been
tackled by the proposed algorithm. The performance of wBCO and its effectiveness in dealing with
feature selection problem are empirically evaluated on several standard benchmark optimization
functions and datasets and compared to the state-of-the-art methods, exhibiting the superiority of
wBCO over the competitor approaches.
& 2015 Elsevier Ltd. All rights reserved.
Keywords:
Bee colony optimization
Categorical optimization
Classication
Feature selection
Weighted bee colony optimization
1. Introduction
Swarm intelligence (SI) is one of the well-known classes of
optimization and refers to algorithms relying on the intelligence of
a swarm to locate the best parts of the solution space. Particle
swarm optimization (PSO) (Kennedy and Eberhart, 1995), ant
colony optimization (ACO) (Dorigo et al., 1999) and BCO (Nikolic
and Teodorovic, 2013; Teodorovic et al., 2006), are examples of SI
algorithms. Many problems such as text clustering (Dziwiski,
et al., 2012), feature selection (Forsati et al., 2014; Forsati et al.,
2012; Unler and Murat, 2010), etc., can be modeled as
discrete optimization problems and solutions obtained through
SI algorithms.
BCO is one of the most recent developments of swarm intelligence
proposed by Teodorovic et al. (2006), which has been successfully
applied to many elds of science including image analysis (Ghareh
n
Corresponding author.
E-mail address: amoayedi@deakin.edu.au (A. Moayedikia).
1
Alireza Moayedikia, is a PhD Student in Information Systems, Department of
Information Systems and Business Analytics, Faculty of Business and Law, Deakin
University 70 Elgar Road, Burwood 3125, VIC, Australia
http://dx.doi.org/10.1016/j.engappai.2015.06.003
0952-1976/& 2015 Elsevier Ltd. All rights reserved.
154
which bees will follow the other bees (recruiter selection). Within a
generation the algorithm iterates between the second and the fth
stages until all the bees create their full solutions. For further
details about BCO interested readers may refer to the work of
Forsati et al. (2015).
The advantage of BCO is its ability in tuning the search
direction in the early stages of exploration, while other SI algorithms such as ACO require full traversals by all ants to adjust
pheromone weights accurately and nally identifying the worthwhile exploration paths. Once the bees perform the backward
movement they in fact try to distinguish worthwhile and nonworthwhile solution paths. This action allows the search direction
to be tuned toward the most optimal parts of the solution space
found so far. Similarly, PSO has the same characteristic, in which
the ock of birds ies toward the global and/or local best solutions
while exploring the solution space.
Mainly swarm intelligence algorithms (including BCO) rely on
randomness to search the solution space. This might give absolute
freedom to the swarm to search the solution space, but randomness might degrade the exploitation ability of BCO, in the sense
that the worthwhile parts of the space remain undiscovered or
unintentionally ignored. In other words, the bee colony has weak
exploitation ability while having a good level of exploration (Kang
et al., 2011). Therefore to increase the exploitation ability of BCO,
we introduce a new variation called weighted bee colony optimization (wBCO) which considers new policies in measuring the
loyalty degrees of the bees and also recruiter selection. The
formulation of recruiter selection will make the algorithm applicable for classication and regression problems.
As explained, each backward step has three stages: tness
assessment, loyalty assessment, and recruiter selection. In the
backward step for wBCO, where the bees measure how loyal they
are to their created (partial) solutions, the algorithm considers two
weights for each food source. One is a global weight, which
measures how popular a given food source is in the swarm and
the other is a local weight, which indicates the extent to which a
selected food source can contribute to the category label of the
classication problem. In the recruiter selection step, in order to
preserve diversity the followers select their recruiters in a ltering
stepwise process. We apply two ltering stages; one is based on
similarity, and the other based on tness values. In similarity
ltering, for a given follower a set of recruiters is selected based
on the traversal similarity and then the follower selects a recruiter
bee which has the closest tness value. This recruiter selection
strategy is only applicable if the variables of a classication problem
only accept discrete values (e.g., integers, binary, letters).
To investigate the effectiveness of the proposed algorithm in
application, we further applied the proposed wBCO to feature
selection (FS) and modeled the curse of dimensionality as a
discrete optimization task to investigate if wBCO can have applicability in classication tasks. Also, other applications such as text
classication can be modeled using wBCO, but as a result of wide
applications of FS including bioinformatics (Chyzhyk et al., 2014),
systems monitoring (Shen and Jensen, 2004), text mining (Jensen
and Shen, 2004), image processing (Da Silva et al., 2011), etc., we
decided to model FS as a discrete optimization task with wBCO.
The new feature selection algorithm is called FS-wBCO and
successful implementation of FS-wBCO will indicate that the
proposed wBCO is also applicable in the elds relying on FS. The
contributions of this paper are summarized as follows.
has two weights: local and global. In the former the algorithm
measures how popular the food source is in the swarm, while
in the latter the algorithm determines the extent to which the
selected food source is relevant to a category label.
In line with exploitation improvements, we modify the recruiter selection of the original BCO with the aim of using
heuristics to preserve diversity in the bees population by
assigning each uncommitted bee to the most similar
committed one.
To investigate the utility of wBCO, feature selection is modeled
as a discrete optimization problem resulting in another algorithm known as FS-wBCO. Experiments are carried out to
investigate the efcacy of both wBCO and FS-wBCO.
2. Literature review
As we are introducing a new BCO algorithm, the focus of this
section is on some of the recent developments of bee colony-based
algorithms. Regarding feature selection algorithms, interested
readers can refer to (Liu and Motoda, 1998). In the literature,
two approaches to bee colony-based algorithms are proposed. One
is the articial bee colony (ABC) algorithm proposed by Karaboga
and Akay (2009), Karaboga et al. (2014) and the other is BCO
proposed by Teodorovic et al. (2006). As both of the algorithms
rely on the natural behavior of the bees, we also consider the ABC
algorithm in this paper.
As shown in Table 1, bee colony improvements mainly aim at
improving either the exploration or the exploitation of the algorithm. Hence in this review, we divide the bee colony improvements into these two categories. A third category is related to
algorithms targeting improvements in both exploration and
exploitation powers of BCO.
There are some BCO algorithms focusing on improvements of
exploration power. Gao and Liu (2011) proposed IABC that uses
differential evolution, which is suitable for global optimization.
Table 1
An overview of the reviewed articles.
Articles
Lu et al. (2014)
Huang and Lin (2011)
Kumar (2014)
Forsati et al. (2015)
Alzaqebah and
Abdullah (2014)
Karaboga and Akay
(2011)
Gao and Liu (2011)
Li et al. (2012)
Akbari et al. (2012)
Imanian et al. (2014)
Alatas (2010)
Kashan et al. (2012)
wBCO
155
156
As the bees leave their hive, each will take NC random steps to
create a partial solution. Therefore, the initial partial solution has
size NC, and grows iteratively in the next forward steps with the
maximum size of NC. The execution of this stage is similar to the
original BCO algorithm and each bee decides randomly to explore
the solution space. The pseudo code of the partial solution creation
process is shown in Algorithm 1.
As outlined in Algorithm 1, each bee creates its partial solution
with the size NC as shown in Lines 1 to 5. The condition of
selection of a food source as shown in Line 4 is to check the
previously traversed food sources to make sure that the current
food source has not been considered in the previous traversals of
the same bee. If not, then the bee would consider it as the next
possible selection. Once the next food source is selected by all the
Loyalty assessment
Global
Weight
Local
Weight
Stepwise
filtering
assignme
Recruiter selection
bees then the length of the solution will increase. This stage
terminates once all the bees have created their partial solutions,
with the size of NC or lower.
Algorithm 1: Partial solution creation
Input:
An initialized population
NC: the size of constructive step
Output:
A partial solution
Algorithm:
1. Len0
2. while Len oNC//(i.e. partial solutions with the size NC is not
created)
3. foreach bee b
4. Randomly select a food source which has not been selected
before by the same bee b.
5. end foreach
6. Len Len 1; //(i.e. increasing the length of currently created
solutions by one)
7. end while
m 1
Fitm
Fit k
k P
1
Select iUnseli 0 and Seli a 00Seli 0
2
where Seli and Unseli are the number of bees that have selected
Sel
Pi
Fitk is the
and unselected the ith food source, respectively.
k0
summation of the tness value of the bees that have selected the
P
ith food source,
ignoredi is the total number of bees that have
Unsel
Pi
Fit m is the summation of the
ignored the ith food source,
m0
tness value of the bees that have unselected the ith food source,
P
and nally
Selecti is the total number of bees that have
selected the ith food source.
Algorithm 2 shows the global weight evaluation. In Line 3 if the
global weight of a food source has not been computed before by
another bee, then it will be evaluated through Lines 4 to 9. This
process iterates for all the food sources (Lines 2 to 10) and for all
the bees (Lines 1 to 11). In cases where a food source has been
selected by all the members of the swarm, Unseli is zero, and the
global weight of a food source which has not been selected by any
of the swarm members is equal to zero.
157
k1
globali PUnsel
i
m1
Fitk
Fitm
158
x1
CorC xp ; F ji
P
correlationi FitBj
X
P
x1
CorC xp ; F ji is the
C xp ;
P
Fitk
ignoredi
Selecti
P
PUnsel
Fitm
m1
k1
cb
where FollowingProbabilityub
cb is the probability that a committed
cs
P
bee cb is to be followed by the uncommitted bee ub and
Ei is
i0
cb
ub
1 F cb
Com F uncom
159
cb
ub
cb
in Eq. (8), F ub
uncom a F Com is always true since if F uncom F Com then
both of the bees are either committed or uncommitted.
Fig. 2 shows the difference between random and stepwise
ltering assignments of the bees. In Figs. 2 and 3 each row
corresponds to a bee. The rst column is the bee identier, the
second through the sixth columns each corresponds to a food source,
where values of 1 and 0 refer to the selection of a food source or
otherwise. Binary digits are shown as a symbol of discrete optimization, but letters or integer numbers can also be used. The seventh
column indicates if the bee is committed (C) or uncommitted (U), and
nally the last column is the tness value of the bee.
The rst advantage of this type of assignment is to help the
algorithm to improve the exploitation power through following
the most similar bee which also preserves diversity in the
population as shown in the white box of Fig. 2. In Fig. 3, the
stepwise ltering assignment is shown. In the rst step, committed bees are ltered based on their similarities to the uncommitted ones, and in the second step, similar bees are ltered
according to their tness values.
where F cb
com is the tness value of the cbth similar committed bee
and F ub
uncom is the tness value of the uncommitted bee. Therefore,
ub
the lower the differences between tness values F cb
Com F uncom ;
ub
the higher the value of 1 F cb
Com F uncom which increases the
probability of following. In problems where the tness function
ub
does not satisfy the conditions |F cb
Com | o1 and |F uncom | o1, the
tness values should be normalized within the range [ 1, 1]. Also
Bee population:
B1
B2
B3
B4
B5
B6
B7
B8
1
1
1
0
0
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
0
1
0
1
U
U
U
U
0.39
0.21
0.46
0.19
0
1
0
0
0
1
0
0
1
0
0
0
1
1
1
0
C
C
U
U
U
C
U
C
0.91
0.89
0.39
0.21
0.46
0.96
0.19
0.79
Uncommitted bees
B3
B4
B5
B7
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
Committed bees
0
0
1
1
B1
B2
B6
B8
1
1
1
1
1
0
0
1
0
1
1
1
Random
B1
B2
B3
B4
B5
B6
B7
B8
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
0
1
1
0
1
0
1
0
C
C
C
C
0.91
0.89
0.96
0.79
Stepwise filtering
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
0
0
0
0
1
0
0
C
C
C
C
C
C
C
C
0.91
0.89
0.89
0.89
0.89
0.96
0.89
0.79
B1
B2
B3
B4
B5
B6
B7
B8
1
1
1
1
1
1
1
1
1
0
1
0
1
0
1
1
0
1
1
1
0
1
0
1
0
1
0
1
0
1
0
0
1
0
0
0
1
1
1
0
C
C
C
C
C
C
C
C
0.91
0.89
0.79
0.89
0.91
0.96
0.91
0.79
Uncommitted bee:
B1
B2
B6
B8
1
1
1
1
B7
1
0
0
1
0
1
1
1
0
1
1
0
1
0
1
0
C
C
C
C
0.91
0.89
0.96
0.79
0.19
B1
B6
1
1
1
0
0
1
0
1
1
1
C
C
0.91
0.96
B1
Final recruiter
Population of similar bees
0.91
160
where fi is the ith value of a feature and C xp is the xth predicted class
label. P f i j C xp calculates the probability of co-occurrence of fi and C xp .
The mutuality degree between the predicted class label and the
selected subset of features normalized by the summation of the
mutuality degrees is the local weight of a feature. In Algorithms
5 and 6 the global and local weighting procedures of a given feature
are explained. The global weight is viewed more as a ratio of the global
acceptance of the feature to the total number of selected features in
the population, while the local viewpoint evaluates the degree of
dependency of each selected feature to the predicted class label.
In fact for each feature we preserve two sorts of information.
One is global and all the bees use it in the loyalty assessment
and the other is local which is a personalized weight and differs
from solution to solution. Interaction of these two sorts of
information in Eq. (6) will identify the loyalty degree of a bee.
The last step before the bees leave their hive for the next
forward movement is distinguishing follower and recruiter
bees, as explained in Section 3.3. Once the recruiter bees are
identied, followers will follow one and only one recruiter
according to Eqs. (7) and (8), where all the food sources are
replaced with features. Table 2 denes the parameters of the
wBCO and FS-wBCO algorithm.
Algorithm 5: Global weight assessment in feature selection
Input:
A set of solutions
Output:
Global weight of the selected features fi
Algorithm:
1. foreach partial solution in the population
2. foreach selected feature i of the current partial solution
3. if the global weight for the ith feature is calculated before
4. Go to the next selected feature;
Else
5. Count the number of bees that have selected the ith feature,
fi.;
Table 2
Denition of wBCO and FS-wBCO parameters.
Variable
Symbol
Initial value
Number of bees
Number of generations
Number of constructive steps
Number of bees selecting a food source
Number of bees ignoring a food source
Fitness of a bee selected the ith food source
Fitness of a bee ignored the ith food source
The xth predicted class label
B
G
NC
Sel
Unsel
F Seli
F Unseli
User-specied
User-specied
User-specied
Problem-dependent
Problem-dependent
Problem-dependent
Problem-dependent
Problem-dependent
F ji
Ei
C xp
F cb
com
F uncom
fi
r
Problem-dependent
Problem-dependent
Problem-dependent
Problem-dependent
Problem-dependent
Problem-dependent
6. Sum tness values of all the bees which have selected feature
fi.;
7. Sum tness value of all the bees which have ignored feature
fi.;
8. Measure the global weight of the ith feature fi according to
Eq. (2);
9. end if
10. end foreach
11. end foreach
5. Experimental results
In this section we empirically investigate the effectiveness of
wBCO and FS-wBCO. Initially we describe the datasets and functions used in these experiments. Then some numerical experiments will be carried out to investigate the efcacy of the
proposed algorithms. The parameter setting of the proposed
algorithms and the competitors are also explained in a separate
section. As we primarily claim proposing improvements over BCO,
we conduct convergence behavior experiments of wBCO and
compare the results against conventional BCO. The Wilcoxon
statistical test is used to investigate statistically the performance
of the proposed variations. In summary this section aims at nding
the answers to the following questions:
1) Are the proposed modications good enough to enhance the
convergence behavior of conventional BCO?
2) Does the number of bees have any effect on the loyalty
assessment and stepwise ltering assignment stages of wBCO?
3) wBCO targets the exploitation power of BCO, but under what
circumstances is it good enough to compete with other BCO
variations that enhance both exploration and exploitation
powers?
4) What is the main inuential factor which can affect the
performance of wBCO in feature selection?
5) How does FS-wBCO compare to other swarm and evolutionary
FS algorithms? Can the algorithm outperform all competitors
for all datasets?
161
http://www.sfu.ca/ ssurjano/optimization.html.
Datasets can be downloaded from: https://drive.google.com/folderview?
id=0B5R8ibfLZ7zmS3Q1bTFHb3NZdWM&usp=sharing.
4
http://www.matthewajohnson.org/software/svm.html.
3
162
Table 3
The benchmark functions.
Function name
f1
f2
Bohachevsky 1
f3
Booths
f4
Hartman 3
Equations
Intervals
Global minimal
Class
10r x; yr 10
F(xn) 0
Small
10r x1,x2 r 10
F(xn) 0
x 2y 72 2x y 52
!
2
4
3
P
P
ci exp
aij xj pij
10r x, yr 10
F(xn) 0
0 r xi r 1
F(xn) 3.862782
10r x,yr 10
F(xn) 0
5r x,yr 5
F(xn) 0
10r xi r 10
10
F(xn) 0
5 r xi r 5
10
F(xn) 9.66015
15r xi r 15
15
F(xn) 200
10r xi r 10
20
F(xn) 0
i1
f5
Matyas
f6
Scaffers F6
f7
Schwefels 1.2
Michalewicz
0:26 x2 y2 0:48xy
p
2
sin
x2 y2 0:5
0:5 1 0:001x2 y2 2
n
P
i
P
d
P
i1
n
P
f9
i1
f 10
Schwefels 2.22
N
P
i1
f 11
Bohachevsky 2
nP
1
i1
f 12
Expansion F10
f 13
Styblinski
f 14
Penalized 1
xj 2
sin xi sin
xi 12
2m
i
P
j2
2
ixi
j xi j N
i 1 j xi j
f x f 10 x1 ; x2 f 10 xn ; x0
0:1
1
f 10 x; y x2 y2 0:25 sin 2 50 x2 y2
100 r xi r 100
25
F(xn) 0
N
P
x4i 16x2i 5xi
(
)
1
nP
1
nP
2 h
2 i
2
y1
yi 1 1 10 sin 2 yi 1 yn 1
uxi ; 10; 100; 4
n 10 sin
5 r xi r 5
30
F(xn) 39.16599 N
5 r xi r 5
30
F(xn) 0
10r xi r 10
35
F(xn) 0
100 r xi r 100
40
F(xn) 0
20 r xi r 20
45
F(xn) 0
5 r xi r 5
45
F(xn) 0
1
2
i1
N
20
P
P
i1k0
f 16
Step
N
P
i1
f 17
Rotated Rastrigin
8
>
<
Penalized 2
Large
i1
kxi am xi 4 a
0 a r xi o a
>
: k x am x o a
i
i
20
P
k0
ABSxi 0:52
D
P
x2i 10 cos 2xi 10
(
)
h
i
1
nP
1
nP
xi 12 1 sin 2 3xi 1 xn 12
uxi ; 5; 100; 4
0:1 sin 2 3y1
i1
f 18
Medium
5 r xi r 5
yi 1 0:25xi 1 and u
Weierstrass
m 10
xi xi 1
i1
f 15
j1
i1 j1
f8
i1
30 or higher slight changes were seen in the nal results. Therefore we set the number of independent executions to 30. In each
execution we set the number of iterations to 100 as in iterations
higher than 100 only slight changes were seen in the nal results.
5.3. Convergence behavior studies
In this section, we investigate the convergence behavior of BCO
and wBCO, to address the rst question. The convergence behavior
experiments in this section will show how many bees are required
at least for an algorithm to reach its near optimal solution. We
refer to near optimal solutions as according to Imanian et al.
(2014) there is no specic algorithm to regularly achieve the best
solution for all optimization problems. In population-based algorithms such as wBCO and BCO where there is no connection
between each pair of consecutive iterations, the number of bees
will have an impact on the convergence rather than number of
iterations. Hence increasing or decreasing the population size
according to the solution space size will show how effective wBCO
and BCO are in reaching near optimal solutions.
In Fig. 4, the Matyas function is selected as an example of a
small size function. This function is used as the experiments
indicated that it is very sensitive to even small changes of the
algorithms setting. Therefore it can better reect the convergence
speed of the algorithms. wBCO converges when the number of
bees is 300 or higher, while BCO requires more bees to obtain
i1
Table 4
Dataset details.
Datasets
(Training/test/
validation)
No. of
features
No. of
Classes
NC
(490/42/167)
(150/13/61)
(146/12/50)
(195/17/67)
(398/34/137)
9
10
60
27
30
2
7
2
2
2
3
2
12
9
10
(592/52/202)
(272/60/144)
(316/27/109)
18
168
279
4
2
16
6
35
50
163
Table 5
Parameter setting for numerical experiments.
Algorithm
Parameters
wBCO
BCO
DisrBCO
CDBCO
DBCO
Table 6
Parameter setting of the implemented FS algorithms.
Category
Algorithm
Parameter settings
Proposed
Swarm based
FS-wBCO
BCOFS
BPSO
DisABC
Evolutionary based
10
20
30
40
50
60
70
80
Minimization values
wBCO
9.55
9.45
9.35
9.25
9.15
9.05
8.95
wBCO
BCO
Minimization values
Minimization values
wBCO
5.4
4.9
4.4
3.9
3.4
2.9
2.4
1.9
RHS
HGAFS
BCO
100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
Number of bees (B)
This fact indicates that in wBCO the larger the number of bees, the
more information is collected regarding the food sources in the
surrounding area of the hive. Hence, the algorithm can move the
bees around the near optimal parts from the early stages of
algorithm execution to nally enhance the convergence behaviour
of conventional BCO.
5.4. Performances and comparisons
In this section, the performance of FS-wBCO and wBCO are
measured. In the experiments of wBCO we are using 18 minimization functions as shown in Table 3 and relevant comparisons are
made with other bee colony-based approaches. As explained
BCO
2.6
2.4
2.2
2
1.8
600 650 700 750 800 850 900 950 1000 1100 1200 1300 1400 1500
Number of bees (B)
before, ABC and BCO are two different algorithms. But as ABC
relies on the natural behavior of the bees in locating food sources
we consider ABC as another relevant work for comparison. One
may ask: once the bees loyalty degrees are assessed, how will they
be divided into two disjoint groups of committed and uncommitted bees? The bees can be divided based on different division
strategies. In this paper, and according to our experimentation, we
divide the bees based on the average values of their loyalty
degrees, where bees having loyalty degrees higher than the
average are considered to be loyal.
It is important to note that the proposed algorithms are
suitable only for discrete optimization; continuous optimization
algorithms are not relevant here and cannot be compared. This
restricts the algorithms to choose only integer values. Hence, the
competitors are implemented and tested in our setting and
experimental conditions.
Table 7 shows the experiments of wBCO and its competitors
with three different values for population size. The results indicate
that the performance of wBCO is sensitive to the solution space
size and the number of bees. The higher the number of bees, the
better wBCO performs. This is as a result of more information
which is available in order to measure the loyalty degrees and
assess the stepwise ltering assignment.
When the solution space is small (i.e. in functions F1 to F6),
setting B 50 produces a performance that is better than the
competitors. By increasing the size of the solution space this
superiority declines. Once the size of the solution space gets larger
more possible solutions will be available and consequently more
164
Table 7
Evaluations using minimization functions with different population sizes.
B=50
Small
F1
F2
F4
F5
F6
Medium
F7
F8
F9
F10
F11
F12
Large
F13
F14
F15
F16
F17
F18
B=5000
wBCO
BCO
DisrBCO
CDBCO
DBCO
wBCO
BCO
DisrBCO
CDBCO
DBCO
wBCO
BCO
DisrBCO
CDBCO
DBCO
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
0.33E+2
0.44E+2
0.35E+2
0.44E+2
0.23E+3
0.2E+3
0.41E+0
0.3E+0
0.99E+1
0.88E+1
0.27E+0
0.22E+0
0.43E+2
0.68E+2
0.6E+2
1.01E+2
0.15E+3
0.21E+3
0.64E+0
0.44E+0
0.13E+2
1.2E+1
0.23E+0
0.26E+0
0.47E+2
0.61E+2
0.59E+2
0.79E+2
0.31E+3
0.28E+3
0.77E+0
0.72E+0
0.14E+2
1.42E+1
0.3E+0
0.57E+0
0.87E+2
0.61E+2
0.68E+2
0.74E+2
0.4E+3
0.24E+3
0.68E+0
0.63E+0
0.7E+0
0.11E+1
0.3E+0
0.71E+0
0.43E+2
0.67E+2
0.66E+2
0.88E+2
0.34E+3
0.31E+3
1.04E+0
1.1E+0
0.2E+2
0.15E+2
0.33E+0
0.51E+0
0.43E+2
0.47E+2
0.5E+2
0.53E+2
0.26E+3
0.21E+3
0.48E+0
0.38E+0
0.11E+3
0.82E+1
0.31E+0
0.24E+0
0.49E+2
0.7E+2
0.68E+2
0.96E+2
0.3E+3
0.3E+3
0.57E+0
0.534E+0
1.7E+1
1.5E+1
0.18E+0
0.25E+0
0.46E+2
0.7E+2
0.63E+2
0.9E+2
0.38E+3
0.38E+3
0.73E+0
0.82E+0
0.16E+3
0.14E+2
0.34E+0
0.53E+0
0.76E+2
0.59E+2
0.63E+2
0.61E+2
0.48E+2
0.26E+4
0.63E+0
0.52E+0
1.05E+0
0.9E+0
0.27E+0
0.65E+0
0.25E+2
0.43E+2
0.23E+2
0.41E+2
0.48E+3
0.43E+2
0.93E+0
0.88E+0
1.91E+1
1.7E+1
0.33E+0
0.52E+0
0.44E+2
0.43E+2
0.55E+2
0.59E+2
0.23E+3
0.19E+3
0.57E+0
0.38E+0
0.82E+1
0.6E+1
0.32E+0
0.27E+0
0.47E+2
0.7E+2
0.66E+2
0.98E+2
0.22E+2
0.27E+3
0.6E+0
0.6E+0
0.19E+2
1. 7E+1
0.19E+0
0.27E+0
0.49E+2
0.71E+2
0.67E+2
1.08E+1
0.5E+3
0.44E+3
0.92E+0
0.85E+0
0.18E+2
1. 6E+1
0.33E+0
0.53E+0
0.83E+2
0.66E+2
0.8E+2
0.665E+2
0.46E+3
0.352E+3
0.9E+0
0.54E+0
0.99E+0
0.7E+0
0.31E+0
0.62E+0
0.486E+2
0.717E+2
0.66E+2
E+1
0.507E+3
0.452E+3
0.95E+0
0.86E+0
0.188E+2
1. 73E+1
0.335E+0
0.533E+0
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
0.12E+2
0.81E+2
0.25E+0
0.89E+1
0.11E+3
0.49E+3
0.12E+2
0.26E+2
0.89E+2
0.3E+3
0.18E+2
0.21E+3
0.1E+2
0.97E+2
0.42E+0
0.94E+1
0.32E+3
0.96E+3
0.2E++3
0.63E+3
0.11E+3
0.48E+3
0.56E+3
0.32E+3
0.8E+4
0.51E+4
0.48E+1
0.51E+1
0.18E+3
0.12E+3
0.1E+2
0.95E+2
0.23E+3
0.23E+3
0.15E+3
0.17E+3
0.91E+5
0.55E+5
0.76E+0
0.1E+2
0.38E+3
0.47E+3
0.39E+2
0.11E+3
0.17E+3
0.62E+3
0.94E+3
0.28E+3
0.29E+5
0.24E+5
0.55E+0
0.96E+1
0.38E+3
0.13E+4
0.11E+2
0.1E+3
0.1E+3
0.48E+3
0.47E+2
0.31E+3
0.2E+4
0.14E+4
0.81E+0
0.9E+1
0.2E+3
0.46E+3
0.23E+2
0.62E+2
0.14E+3
0.37E+3
0.19E+2
0.21E+3
0.6E+4
0.24E+5
0.34E+0
0.94E+1
0.32E+3
0.93E+3
0.11E+2
0.96E+2
0.1E+3
0.47E+3
0.54E+2
0.31E+3
0.1E+4
0.61E+4
0.47E+1
0.47E+1
0.23E+3
1.15E+3
0.12E+2
0.97E+2
0.08E+3
0.44E+3
1.48E+2
0.14E+3
7.84E+4
6.45E+4
0.58E+0
0.98E+1
0.93E+2
0.28E+3
0.39E+2
1.06E+2
0.32E+3
0.5E+3
0.94E+2
0.29E+3
3.34E+4
2.72E+4
0.58E+0
0.97E+1
0.36E+3
1.31E+3
0.12E+2
0.99E+2
0.1E+3
0.48E+3
0.5E+2
0.32E+3
1.84E+3
1.35E+3
1.13E+0
0.86E+1
0.36E+3
0.54E+3
0.28E+2
0.56E+2
0.15E+3
0.34E+3
0.2E+2
0.2E+3
16.5E+3
5.65E+3
0.4E+0
0.9E+1
0.3E+3
0.95E+3
0.11E+3
0.98E+2
0.99E+2
0.48E+3
0.52E+2
0.31E+3
13.2E+3
12.9E+3
0.38E+0
0.95E+1
0.3E+3
1.22E+3
0.12E+2
0.99E+2
0.9E+2
0.46E+3
0.15E+3
0.15E+3
97E+3
60.3E+3
0.41E+0
0.101E+2
0.93E+2
0.28E+3
0.44E+2
1.02E+2
0.21E+3
0.6E+3
0.89E+2
0.3E+3
34.8E+3
27.4E+3
0.258E+0
0.881E+1
0.385E+3
1.32E+3
0.132E+2
E+2
0.1E+3
0.47E+3
0.5E+2
0.32E+3
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
0.17E+3
0.75E+3
0.39E+0
0.2E+1
9.8E-12
4.7E-11
0.1E+4
0.61E+4
0.52E+3
0.47E+4
0.1E+2
0.37E+2
0.21E+3
0.1E+4
0.38E+0
0.27E+1
1.3E-11
4.06E-11
0.84E+3
0.73E+4
0.89E+3
0.49E+4
0.12E+2
0.41E+2
0.51E+3
0.66E+3
0.13E+1
0.14E+1
2.9E-11
3.7E-11
6.3E+4
6.3E+4
2.9E+3
0.3E+4
0.22E+2
0.26E+2
0.38E+3
1.7E+3
0.26E+1
0.32E+1
0.24E+2
0.17E+3
0.11E+6
0.14E+6
0.55E+4
0.65E+4
0.46E+2
0.54E+2
0.26E+3
0.1E+4
0.41E+0
0.3E+1
1.6E-11
6.5E-11
0.19E+5
0.13E+6
0.78E+3
0.59E+4
0.84E+1
0.49E+2
0.26E+3
0.8E+3
0.56E+0
0.2E+1
1.19E-11
5.54E-11
0.13E+4
5.8E+3
0.1E+4
0.47E+4
0.12E+2
0.34E+2
0.18E+3
1.07E+3
0.38E+0
0.26E+1
8.7E-12
5.57E-11
0.84E+3
7.3E+3
0.66E+3
0.57E+4
0.07E+2
0.46E+2
0.14E+3
1.02E+3
0.38E+0
0.26E+1
1.09E-11
6.06E-11
0.63E+4
6.3E+3
0.62E+4
0.56E+4
0.58E+1
0.44E+2
0.234E+3
1.98E+3
0.22E+1
0.38E+1
0.29E+2
0.176E+3
110.8E+3
142.4E+3
5.17E+3
0.64E+4
0.51E+2
0.61E+2
0.2E+3
1.1E+3
0.6E+0
0.28E+1
1.4E-11
6.6E-11
19.3E+3
133.3E+3
0.57E+3
0.56E+4
0.72E+1
0.47E+2
0.31E+3
0.78E+3
0.79E+3
0.19E+1
1.7E-11
5.52E-11
1.82E+3
5.57E+3
1.26E+3
4.5E+3
0.146E+2
0.31E+2
0.2E+3
1.1E+3
0.34E+1
0.25E+1
E-11
5.74E-11
1.01E+3
7.13E+3
0.7E+3
5.2E+3
0. 63E+1
0.46E+2
0.17E+3
1.06E+3
0.26E+1
0.28E+1
1.08E-11
6.05E-11
14.3E+3
126E+3
0.64E+3
5.78E+3
0.07E+2
0.47E+2
0.3E+3
1.916E+3
0.285E+3
0.32E+1
0.278E+2
0.173E+3
120E+3
134E+3
5.33E+3
6.4E+3
0.32E+2
0.75E+2
0.114E+3
0.962E+3
0.527E+0
2.86E+1
1.4E-11
6.5E-11
18.6E+3
133.2E+3
0.8E+3
6E+3
0.75E+2
0.472E+2
F3
B=500
165
Table 8
KS and CA results for training and testing data using the SVM classier [values in the range 01].
Kappa statistics
Classication accuracy
Evolutionary
based
HGAFS 0.938
(2)
RHS
0.96
(2)
Swarm based BPSO
0.96
(7)
DisABC 0. 95
(7)
BCOFS 0.96
(3)
Proposed
FS0.917
wBCO (6)
Evolutionary HGAFS 0.938
based
RHS
0.96
Swarm based BPSO
0.96
DisABC 0.95
BCOFS 0.97
Proposed
FS0.95
wBCO
BC
0.812
(2)
0.856
(4)
0.812
(7)
0.781
(5)
0.843
(6)
0.812
(7)
0.935
0.921
0.935
0.927
0.935
0.935
0.731
(4)
0.878
(28)
0.756
(27)
0.682
(33)
0.66
(22)
0.78
(27)
0.738
0.88
0.762
0.69
0.635
0. 785
0.842
(2)
0.868
(2)
0.81
(12)
0.795
(14)
0.642
(29)
0.868
(12)
0.846
0.871
0.82
0.79
0.731
0.871
0.95
(3)
0.945
(7)
0.94
(7)
0.933
(15)
0.93
(15)
0.945
(16)
0.95
0.945
0.95
0.933
0.937
0.946
0.736
(6)
0.836
(14)
0.83
(11)
0.76
(10)
0.668
(11)
0.8
(11)
0.868
0.91
0.914
0.88
0.864
0.9
GL
SO
HR
WDBC VC
0.797
(35)
0.832
(57)
0.825
(76)
0.797
(57)
0.82
(72)
0.82
(76)
0.798
0.833
0.826
0.798
0.77
0.826
0.63
(19)
0.63
(133)
0.63
(148)
0.624
(87)
0.588
(135)
0.66
(148)
0.655
0.653
0.65
0.65
0.655
0.67
MUS
1
ARR
Training results
#TC
#TS
0.84
0.98
0.821 0.964
0.898 0.652
0.98
0.84
0.83
0.652
0.98
0.976
0.971
0.981
0.976
0.981
0.981
0.915
0.934
0.934
0.952
0.952
0.962
0.593
0.593
0.677
0.66
0.644
0.644
0.847
0.823
0.776
0.823
0.717
0.835
0.965
0.965
0.965
0.965
0.965
1
0.883
0.918
0.93
0.93
0.918
0.93
BC
GL
SO
HR
WDBC VC
0.9
0.9
0.916
0.883
0.833
0.916
0.656
0.658
0.658
0.658
0.658
0.67
MUS
1
ARR
Testing results
10
and is calculated via Eqs. (11) to (13). The aim of using this
measure is to assess the level of agreement between the classiers
output and the actual classes of the dataset. The kappa statistic is
calculated as follows:
Pc
N
P A i 1 i
11
N
P E
KS
c
X
N in N ni
:
N N
i1
P A PE
1 PE
12
13
166
Table 9
Wilcoxon test of wBCO and the competitors.
Second algorithm
wBCO
First algorithm
wBCO
BCO
DisrBCO
CDBCO
DBCO
BCO
0.5
0.5
1
6
0
19
34
38
DisrBCO
CDBCO
DBCO
1
19
6
34
52
0
38
19
48
52
19
48
5
3.5
3.5
4
8.5
10
5.5
17
8
4.5
4
8.5
5.5
6
0
2
10
17
5.5 8
5.5 6
7.5
7.5
X
di o 0
Second algorithm
1
5
4.5
0
2
S
Table 10
Wilcoxon test of FS-wBCO and the competitors.
where values of S and S are measured using Eqs. (15) and (16),
repectively.
X
1X
rankdi
rankdi
15
S
2d 0
d 40
14
rankdi
1X
rankdi
2d 0
16
were carried out using the SVM classier and benchmark datasets
obtained from the UCI machine learning repository. FS-wBCO
could improve conventional BCO-based feature selection, and gain
superiorities over BCOFS. It also showed superiorities over other
swarm and evolutionary-based algorithms, but had some inferiorities to its competitors (also demonstrated in the Wilcoxon
signed-ranks tests). This could result from dataset division as a
hold-out strategy was used. FS-wBCO and BPSO were mainly
similar with superiorities of FS-wBCO shown in a few datasets.
As part of our future work, we plan to further investigate the
proposed algorithms with more functions. The FS-wBCO algorithm
will be executed with other classiers such as decision trees,
ANNs, k-nearest neighbors, etc. and experiments will be conducted using alternative methodologies such as leave one out
cross validation or n-fold cross validation procedures. Also, wBCO
will be applied to regression problems, and will be improved to be
applicable for continuous problems. The local weighting procedure
will be modied to measure the correlation between the features
and the dependent variables.
References
Akbari, R., Hedayatzadeh, R., Ziarati, K., Hassanizadeh, B., 2012. A multi-objective
articial bee colony algorithm. Swarm Evol. Comput. 2, 3952.
Alatas, B., 2010. Chaotic bee colony algorithms for global numerical optimization.
Expert Syst. Appl. 37, 56825687.
Alzaqebah, M., Abdullah, S., 2014. Hybrid bee colony optimization for examination
timetabling problems. Comput. Oper. Res. 54, 142154.
Chyzhyk, D., Savio, A., Grana, M., 2014. Evolutionary ELM wrapper feature selection
for Alzheimers disease CAD on anatomical brain MRI. Neurocomputing 128,
7380.
Cohen, J., 1960. A coefcient of agreement for nominal scales. Educ. Psychol. Meas.
20 (1), 3746.
Da Silva, S., Ribeiro, M., Batista Neto, J., Traina-Jr, C., Traina, A., 2011. Improving the
ranking quality of medical image retrieval using a genetic feature selection
method. Decis. Support Syst. 51, 810820.
Demsar, J., 2006. Statistical comparisons of classiers over multiple data sets. J.
Mach. Learn. Res. 7, 130.
Dorigo, M., Caro, G., Gambardella, L.M., 1999. Ant algorithms for discrete optimization. Artif. Life 5 (2), 137172.
Dziwiski, P., Bartczuk, ., Starczewski, J., 2012. Fully controllable ant colony system
for text data clustering. Lect. Notes Comput. Sci. 7269, 199205.
Forsati, R., Keikhah, A., Shamsfard, M., 2015. An improved bee colony optimization
algorithm with an application to document clustering. Neurocomputing.
Forsati, R., Moayedikia, A., Keikhah, A., 2012. A novel approach for feature selection
based on the bee colony optimization. Int. J. Comput. Appl. 43 (8), 3034.
Forsati, R., Moayedikia, A., Safarkhani, B., 2011. Heuristic approach to solve feature
selection problem. Lect. Notes Comput. Sci., 707717, Springer, Verlag Berlin
Heidelberg.
Forsati, R., Moayedikia, A., Jensen, R., Shamsfard, M., Meybodi, M., 2014. Enriched
ant colony optimization and its application in feature selection. Neurocomputing 142, 354371.
Gao, W., Liu, S., 2011. Improved articial bee colony algorithm for global optimization. Inf. Process. Lett. 111, 871882.
167
Ghareh Mohammadi, F., Saniee Abadeh, M., 2014. Image steganalysis using a bee
colony based feature selection algorithm. Eng. Appl. Artif. Intell. 31, 3543.
Huang, J., Cai, Y., Xu, X., 2007. A hybrid genetic algorithm for feature selection
wrapper based on mutual information. Pattern Recognit. Lett. 28, 18251844.
Huang, Y., Lin, J., 2011. A new bee colony optimization algorithm with idle-timebased ltering scheme for open shop-scheduling problems. Expert Syst. Appl.
38, 54385447.
Imanian, N., Shiri, M., Moradi, P., 2014. Velocity based articial bee colony algorithm
for high dimensional continuous optimization problems. Eng. Appl. Artif. Intell.
36, 148163.
Jensen, R., Shen, Q., 2004. Fuzzyrough attribute reduction with application to web
categorization. Fuzzy Sets Syst. 141 (3), 469485.
Kang, F., Li, J., Ma, Z., 2011. Rosenbrock articial bee colony algorithm for accurate
global optimization for numerical functions. Inf. Sci. 12, 35083531.
Karaboga, D., Akay, B., 2011. A modied articial bee colony (ABC) algorithm for
constrained optimization problems. Appl. Soft Comput. 11, 30213031.
Karaboga, D., Akay, B., 2009. A survey: algorithms simulating bee swarm intelligence. Artif. Intell. Rev. 31 (1-4), 6185.
Karaboga, D., Gorkemli, B., Ozturk, C., Karaboga, N., 2014. A comprehensive survey:
articial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42 (1),
2157.
Kashan, M., Nahavandi, N., Kashan, A., 2012. DisABC: a new articial bee colony
algorithm for binary optimization. Appl. Soft Comput. 12 (1), 342352.
Kennedy, J., Eberhart, R. 1995. Particle swarm optimization. In: IEEE International
Conference on Neural Networks: Proceedings, pp. 19421948.
Kumar, R., 2014. Directed bee colony optimization algorithm. Swarm Evol. Comput.
17, 6073.
Kumar, R., Sadu, A., Kumar, R., Panda, S., 2012. A novel multi-objective directed bee
colony optimization algorithm for multi-objective emission constrained economic power dispatch. Electr. Power Energy Syst. 43, 12411250.
Li, B., Li, Y., Gong, L., 2014. Protein secondary structure optimization using an
improved articial bee colony algorithm based on AB off-lattice model. Eng.
Appl. Artif. Intell. 27, 7079.
Li, G., Niu, P., Xiao, X., 2012. Development and investigation of efcient articial bee
colony algorithm for numerical fnction optimization. Appl. Soft Comput. 12,
320332.
Liu, H., Motoda, H., 1998. Feature Selection for Knowledge Discovery and Data
Mining. Springer Science & Business Media.
Lu, P., Zhou, J., Zhang, H., Zhang, R., Wang, C., 2014. Chaotic differential bee colony
optimization algorithm for dynamic economic dispatch problem with valvepoint effects. Electr. Power Energy Syst. 62, 130143.
Monirul Kabir, M., Shahjahan, M., Murase, K., 2011. A new local search based hybrid
genetic algorithm for feature selection. Neurocomputing 74, 29142928.
Nikolic, M., Teodorovic, D., 2013. Empirical study of the bee colony optimization
(BCO) algorithm. Expert Syst. Appl. 40, 46094620.
Shen, Q., Jensen, R., 2004. Selecting informative features with fuzzy-rough sets and
its application for complex systems monitoring. Pattern Recogn. 37 (7),
13511363.
Teodorovic, D., Lucic, P., Markovic, G., Dell Orco, M. (2006). Bee colony optimization: principles and applications. In: Eighth Seminar on Neural Network
Applications in Electrical Engineering, NEUREL 2006, (pp. 151-156).
Tsymbal, A., Pechenizkiy, M., Cunningham, P., 2005. Diversity in search strategies
for ensemble feature selection. Inf. Fusion 6 (1), 8398.
Unler, A., Murat, A., 2010. A discrete particle swarm optimization method for
feature selection in binary classication problems. Eur. J. Oper. Res. 206,
528539.
Unler, A., Murat, A., Chinnam, R., 2011. mr2PSO: a maximum relevance minimum
redundancy feature selection method based on swarm intelligence for support
vector machine classication. Inf. Sci. 181, 46254641.
Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics 1,
8083.