Anda di halaman 1dari 12

Journal of Intelligent & Fuzzy Systems 32 (2017) 17751786 1775

DOI:10.3233/JIFS-152381
IOS Press

A novel object tracking algorithm by fusing


color and depth information based on single
valued neutrosophic cross-entropy
Keli Hua, , Jun Yea , En Fana , Shigen Shena , Longjun Huanga and Jiatian Pib
a Department of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
b KeyLaboratory of Wireless Sensor Network and Communication, Shanghai Institute of Microsystem
and Information Technology, Chinese Academy of Sciences, Shanghai, China

Abstract. Although appearance based trackers have been greatly improved in the last decade, they are still struggling
with some challenges like occlusion, blur, fast motion, deformation, etc. As known, occlusion is still one of the soundness
challenges for visual tracking. Other challenges are also not fully resolved for the existed trackers. In this work, we focus
on tackling the latter problem in both color and depth domains. Neutrosophic set (NS) is as a new branch of philosophy for
dealing with incomplete, indeterminate and inconsistent information. In this paper, we utilize the single valued neutrosophic
set (SVNS), which is a subclass of NS, to build a robust tracker. First, the color and depth histogram are employed as the
appearance features, and both features are represented in the SVNS domain via three membership functions T , I, and F .
Second, the single valued neutrosophic cross-entropy measure is utilized for fusing the color and depth information. Finally, a
novel SVNS based MeanShift tracker is proposed. Applied to the video sequences without serious occlusion in the Princeton
RGBD Tracking dataset, the performance of our method was compared with those by the state-of-the-art trackers. The results
revealed that our method outperforms these trackers when dealing with challenging factors like blur, fast motion, deformation,
illumination variation, and camera jitter.

Keywords: Object tracking, RGBD, fusion, single valued neutrosophic set, cross-entropy measure

1. Introduction There are mainly two ways to tackle those prob-


lems. One is utilizing robust features. Color model
Object tracking has been extensively studied is frequently employed for tracking due to its robust-
in computer vision due to its applications such ness for confronting blur, deformation and rotation,
as surveillance, human-computer interaction, video etc. MeanShift [9] and CAMShift [7] employed color
indexing, and traffic monitoring, to name a few. information to separate the object from the back-
While a lot of effort has been done in the past ground. Both trackers perform well unless a similar
decades [30, 31, 33], it remains a very challenging color appears around the target. Cross-Bin metric
task to build a robust tracking system to deal with the [22], SIFT [36] and texture feature [6] were intro-
problems like occlusion, blur, fast motion, deforma- duced into the mean shift based trackers, and the
tion, illumination variation, and rotation, etc. enhanced trackers outperform the corresponding tra-
Corresponding
ditional tracker. The other way is training an effective
author. Keli Hu, Department of Com-
puter Science and Engineering, Shaoxing University, Shaoxing
adaptive appearance model. Semi-supervised boost-
312000, Zhejiang, China. Tel.: +86 575 88341512; E-mail: ing [10] was employed for building a robust classifier;
ancimoon@gmail.com. multiple instance learning [4] was introduced into the

1064-1246/17/$35.00 2017 IOS Press and the authors. All rights reserved
1776 K. Hu et al. / A novel object tracking algorithm by fusing color and depth information

classifier training procedure due to the interference 35]. A NS-based cost function between two neigh-
of the inexact training instance; compressive sensing boring voxels was proposed in [13]. In addition, the
theory [18] was applied for developing effective and NS theory is also utilized for improving the clustering
efficient appearance models for robust object track- algorithm, such as c-means [12]. Decision-making
ing due to factors such as pose variation, illumination can be regarded as a problem-solving activity ter-
change, occlusion, and motion blur. In addition, Ross minated by a solution deemed to be satisfactory. A
et al. [25] proposed the IVT method for dealing with lot NS-based decision making methods [5, 21, 24,
appearance variation, other schemes like kernelized 32] were proposed. A single valued neutrosophic set
structured support vector machine [16], LGT [8] and (SVNS) [29] is an instance of a neutrosophic set
TLD [19] also perform well. As known, though many and provides an additional possibility to represent
efforts have been done for handling occlusion prob- uncertainty, imprecise, incomplete, and inconsistent
lem [1, 8, 18, 34], it is still one of the soundness information which exist in real world. Therefore, sev-
challenges for visual tracking. Other challenges are eral SVNS-based algorithms, combining with some
also not fully resolved for the existed trackers [117]. other metrics, were proposed for handling the multi-
All the trackers mentioned above are based on criteria decision making problem. Biswas, et al. [5]
RGB information. The depth information is directly proposed TOPSIS method for multi-attribute group
discarded. That is mainly caused by three reasons. decision-making. Single valued neutrosophic cross-
Firstly, most cameras cannot provide depth informa- entropy was introduced in [32]. Fusing features from
tion. Secondly, though a multi-camera stereo rig can color and depth domain is also an indeterminate prob-
achieve such a goal, 3D reconstruction remains a very lem, and it can be translated into a kind of decision
challenging task [17]. Lastly, although depth sensors making problem. NS is still an open area for informa-
like Microsoft Kinect, Asus Xtion and PrimeSense tion fusion applications. Therefore, it is meaningful
can produce depth and RGB information, each of to form a bridge between NS theory and information
them has limitations. For example, all of them are fusion.
sensitive for distance and illumination. Reliable depth
information can only achieve in a limited range, e.g. 1.1. Proposed contribution
0.83.5 meters for Kinect. Besides, lack of effec-
tive scheme for fusing depth and RGB information In this work, the proposed tracking algorithm
is another main reason. Due to the fact that RGBD based on RGBD data mainly exhibits three contri-
information can provide another dimension of infor- butions. First, a more accurate ROI for initializing
mation for object tracking, a lot of algorithms [2, 14, target model is achieved using depth-based segmen-
15, 23, 27, 28] based on RGBD information have tation. Secondly, the depth distance of the target
been proposed. However, most algorithms focused between adjacent frames is incorporated into the
on tracking a specific target tracking, e.g. people [2, back-projection to facilitate object discrimination.
14, 15, 23] or hand [28]. Few category free RGBD Finally, a color-depth fusion method based on sin-
trackers are proposed. Due to problems like blur, fast gle valued neutrosophic cross-entropy is proposed to
motion, deformation, illumination variation and rota- enhance object tracking. To our own knowledge, it
tion, tracking an object without occlusion in a small is the first time to introduce the NS theory into the
area is still a very challenging job for both RGB visual object tracking domain.
and RGBD trackers. Thus, finding an effective way The remainder of this paper is organized as fol-
to improve existing category free tracker by using lows. In Section 2, main steps and basic flow of our
RGBD information is very meaningful [27]. algorithm is first given, and then the details of the
Neutrosophic set (NS) [26] is as a new branch of proposed algorithm are illustrated in the following
philosophy to deal with the origin, nature and scope subsections. Experimental evaluations and discus-
of neutralities. It has an inherent ability to handle sions are presented in Section 3, and Section 4 is the
the indeterminate information like the noise included conclusion.
in images [3, 11, 20, 35] and video sequences. Till
now, NS has been successfully applied into many
computer vision research fields, such as image seg- 2. Problem formulation
mentation [3, 11, 20, 35] and skeleton extraction [13].
For image segmentation applications, specific neu- In this section, we present the algorithmic details
trosophic image was usually computed [3, 11, 20, of this paper.
K. Hu et al. / A novel object tracking algorithm by fusing color and depth information 1777

Fig. 1. Main steps of the proposed algorithm.

Table 1
Basic flow of the proposed tacking algorithm
Algorithm 1 SVNCE Tracking
Initialization
Input: 1-st video frame in the color and depth domain
1) Select an object on the image plane as the target
2) Select object seeds in the depth domain
3) Extract object from the image using the depth information
4) Calculate the corresponding color and depth histograms as object model
Tracking
Input: (t+1)-th video frame in the color and depth domain
1) Calculate back-projections in both color and depth domain
2) Represent both features in the NS domain via three membership subsets T, I, and F
3) Fusing color and depth information using the single valued neutrosophic cross-entropy method
4) Find the location of the object in the CAMShift framework
5) Update object model and seeds
Output: Tracking location

The main steps of our tracking system are summa-


rized in Fig. 1. Algorithm 1 illustrates the basic flow
of our algorithm, as shown in Table 1. Details of each
main step of our algorithm are given in the following
subsections.

2.1. Extracting object

The bounding box (as shown in Fig. 2) is always


applied for indicating the location of the target by
most trackers [30, 31, 33]. The color information in Fig. 2. Example of target extracting.
the bounding box is frequently employed to initialize
the targets model [6, 7, 9, 22], represented as a color
It is reasonable that we assume target area is the
histogram. However, in addition to the target, the area
part which is closer to us. In addition, the target occu-
in the bounding box sometimes contains background
pies most space of the bounding box. Then the target
information. Thus, such a model could not represent
seeds can be automatically selected by
the targets feature exactly.
Given an initial bounding box (as shown in the left S = { x| RD(x) = ri , i = 1 . . . n} (1)
part of Fig. 2), we try to extract the targets area with
the help of the depth data. A depth-based method for where RD(x) is the distance rank in the bounding box,
extracting the targets area is proposed. e.g., RD(x) = 5% means the depth at pixel location
1778 K. Hu et al. / A novel object tracking algorithm by fusing color and depth information

x placed at the order 5% NBbox by sorting the depth 2.2.1. Calculating the back-projection in color
of each point in the bounding box from near to far. domain
Here, NBbox is the total number of the pixels in the As shown in Algorithm 1, back-projections in both
bounding box. color and depth domain should be computed in every
Then the target region A owns n points, suppose B loop of the tracking procedure. The color histogram
is the set of the points which borders at least one of quc is employed to calculate the back-projection in
the points of A, at each loop, target region is updated color domain:
by
Pc (x) = qbc c (x) (4)
+
A = { x| |D(x) meanxA {D(x)}| < T, x P}
(2) where x is the pixel location.
where A+ is the newly added pixel set, D(x) is the
depth at pixel location x. 2.2.2. Calculating the back-projection in depth
In this work, we set five ranks in Equations (1) domain
for seed selection, where r1 , r2 , r3 , r4 , r5 are set as We assume that the object need to be tracked is a
10%, 15%, 20%, 25%, 30% respectively. Experimen- relative low speed target. For a space point which is
tal results have proved its robustness. As shown in reconstructed by a RGBD sensor, it is reasonable to
Fig. 2, given a rough bounding box, our method can assign a higher probability value to it if it is closer to
extract the target area much more exactly. The rough the location of the target at the previous time. In addi-
edge of the extracted target is mainly caused by the tion, only these pixels which are not very far from the
noise of the source data in the depth domain. Due to previous location of the target on the image plane may
the RGB and depth data are not tightly aligned, thumb belong to the target. Thus, if only the space restriction
dislocation is also occurred. is considered, the probability of the pixel x obtained
from the target at current time can be calculated by

2.2. Calculating back-projections 1 d(x, T)
Pd (x) = erfc 4 2 , x 2Rpre (5)
2 MAXD
Back-projection is a probability distribution map
with the same size as the input image. Each pixel where MAXD is the maximum depth distance
value of the back-projection demonstrates the likeli- between the previous and current targets locations,
hood of the corresponding pixel located in the area of 2Rpre is the points set which is covered by a bound-
the tracked object on the current image plane. Before ing box with the twice size than the previous one,
calculating the back-projection, we build the object but with the same center, and then d(x, T) is the
model in both color and depth domain when the target distance between a point and the target, which is
area is extracted. approximately calculated by
Let {xi }i=1...n be the pixel locations in the region of

d(x, T) = min D(Sri ) D(x) , i = 1 . . . n (6)
the target area, the function b: R2 {1 . . . m} asso-
ciates to the pixel at location xi the index b(xi ) of its where Sri is the i-th seed of the target.
bin in the quantized feature space. The probability
of the feature u = 1, 2, . . . , m in the object model is 2.3. Fusing color and depth information
then computed by

n Employing discriminative feature is one of the
 
quc = C bc (xi ) u , most critical factors for a robust tracker, and the
i=1 method for selecting discriminative feature during the
  tracking process is still an open issue. A well discrim-

n
qud = D bd (xi ) u (3) inative feature owns the ability of effectively setting
i=1 the target apart from the clutter background. Color
and depth features are used by our tracker. However,
where bc is the transformation function in color similar depth or color may appear surrounding the tar-
domain, bd is for the depth domain, is the Kronecker get, and the tracker will fail if a bad feature is applied.
delta function. C and D is the normalization
 c c constant To build a robust feature fusion mechanism, the sin-
derived by imposing the conditions m u=1 qu = 1 and gle valued neutrosophic cross-entropy measure [32]
md d
u=1 qu = 1. is utilized here.
K. Hu et al. / A novel object tracking algorithm by fusing color and depth information 1779

2.3.1. Cross-entropy measure of SVNSs for the depth feature. For the proposition of color feature
decision making is a discriminative feature, T (Ac ), I(Ac ), F (Ac )
For a multicriteria decision-making problem, Let represent the probability of such a proposition is true,
A = {A1 , A2 , . . . , Am } be a set of alternatives and indeterminate and false degrees, respectively. Using
C = {C1 , C2 , . . . , Cn } be a set of criteria. Assume the near region similarity criterion, we can define as:
w is the weight of the criteria Cj , wj [0, 1], and
jn 
m

c

j=1 wj = 1. Then the character of the alternative TCn (Ac ) = quc pcu (9)
Ai (i = 1, 2 . . . m) can be represented by the follow- u=1
ing SVNS information: c
   
m

Ai = Cj , TCj (Ai ), ICj (Ai ), FCj (Ai ) Cj C ICn (A ) =
c
quc pc
u (10)
u=1
i = 1, 2 . . . m, j = 1, 2 . . . n (7)
FCn (Ac ) = 1 TCn (Ac ) (11)
where TCj (Ai ), ICj (Ai ), FCj (Ai ) [0, 1]. TCj (Ai )
denotes the degree to which the alternative Ai satisfies Equation (9) is the Bhattacharyya coefficient
the criterion Cj , ICj (Ai ) indicates the indeterminacy which is frequently employed as a similarity judg-
degree to which the alternative Ai satisfies or does ment [9], where quc is the object model in the color
not satisfy the criterion Cj , FCj (Ai ) indicates the domain, pcu is the histogram feature corresponding
degree to which the alternative Ai does not satisfy to the tracking location (a rectangle bounding box,
the criterion Cj . region G in Fig. 3) in the previous frame.
In the multicriteria decision-making problem, a The indeterminacy degree to which the alternative
weighted cross entropy measure between any alterna- Ac satisfies or does not satisfy the criteria is defined
tive Ai and the ideal alternative A = {< 1, 0, 0 >, in Equation (10), where pcu corresponds to the near
< 1, 0, 0 >, . . . , < 1, 0, 0 >} is proposed in
SVNS domain [32] as follows: region Gn , Gn = G G, as shown in Fig. 3. Both
pcu and pc
u are computed by using Equation (3).
1 As the location estimated by the tracker may some-
log2

n
2 (1 + TCj (Ai ))
1
times drifts from the target, to make the information
Di = wj
1 1 fusion results robust to tracking location errors, we
j=1 + log2 + log2
1 21 ICj (Ai ) 1 21 FCj (Ai ) integrate the other condition, far region similarity
TCj (Ai ) criteria Cf , into the multicriteria decision-making
TCj (Ai ) log2 problem.

n
2 (1 + TCj (Ai ))
1

+ wj In the SVNS, the three functions using the far
1 TCj (Ai ) region similarity criteria Cf are defined as:
j=1 +(1 TCj (Ai )) log2
1 21 (1 + TCj (Ai ))
  TCf (Ac ) = TCn (Ac )

n
1 ICj (Ai )
+ wj ICj (Ai ) + (1 ICj (Ai )) log2 
m

c
1 21 ICj (Ai )
j=1 ICf (A ) =
c
quc pc
u (12)
 n   u=1
1 FCj (Ai )
+ wj FCj (Ai ) + (1 FCj (Ai )) log2 FCf (Ac ) = FCn (Ac )
1 21 FCj (Ai )
j=1

(8)

The smaller the value of Di is, the better the Ai is.


That is, the alternative Ai with smaller Di is closer to
the idea alternative. Thus, after calculating each Di ,
we can decide which alternative Ai is the best one.

2.3.2. Information fusion


Both color and depth features are expressed in the
SVNS domain by Ai = {TCj (Ai ), ICj (Ai ), FCj (Ai )}.
Each feature corresponds to an alternative Ai . Ac cor-
responds to the color feature, and Ad corresponds to Fig. 3. Region illustration for information fusion.
1780 K. Hu et al. / A novel object tracking algorithm by fusing color and depth information

 
where pc
u is the histogram feature corresponding to where M10 = xBBox xP(x), M01 = xBBox
the far region Gf , Gf = G Gn . yP(x), and M00 = xBBox  P(x). Then, the size of
The method for computing the subsets T , I, and F 
the bounding box is s = 2 M00 256.
for the depth feature is the same as the above steps.
Similarly, for the proposition of depth feature is a dis-
2.5. Update object model
criminative feature, T (Ad ), I(Ad ), F (Ad ) represent
the probability of such a proposition is true, indeter-
After finding the location of the target in the current
minate and false degrees, respectively. By applying
frame, we begin to update the object model in both
the criteria Cn and Cf , the related functions are pre-
color and depth domain.
sented as follows:
Before updating the object model, the object seed
md 
 md 
 set is updated using Equation (1). Instead of consid-
TCn (A ) =
d
qud pdu , ICn (Ad ) = qud pd
u, ering all the pixels located in the bounding box, these
u=1 u=1 pixels which satisfy P(x) > Ts are considered for
FCn (Ad ) = 1 TCn (Ad ) updating the object seeds during the tracking process.
By utilizing the object seeds, the target area can be
md 
 well extracted with the help of the depth information.
TCf (A ) = TCn (A ), ICf (A ) =
d d d
qud pd
u , Thus a more exact color histogram can be computed at
u=1 each time of the whole video sequence. Suppose pcu (t)
FCf (A ) = FCn (A )
d d is the color histogram corresponding to the extracted
target area at time t, then the updated color model of
where qud is the object depth model, pd d
u and pu is the the object can be calculated by
corresponding depth histogram feature for the regions
of Gn and Gf , respectively. quc (t) = quc (t 1) + (1 )pcu (t) (15)
Substituting TCn (Ac ), ICn (Ac ), FCn (Ac ), TCf (Ac ), where (0, 1).
ICf (Ac ), FCf (Ac ), TCn (Ad ), ICn (Ad ), FCn (Ad ), Considering that the depth distribution of the target
TCf (Ad ), ICf (Ad ), FCf (Ad ) into Equation (8), we can shifts faster than the colors and the extracted target
obtain two values, Dc and Dd . Then we can decide area is relatively credible, we replace the previous
which feature to choose. Considering a single feature object depth model by the new model:
may result in a confusing back-projection, we fusing
qud (t) = pdu (t) (16)
features in both domains, the new back-projection
after the fusion is defined as


Pc (x) if Dc Dd , Pd (x) Td 3. Experimental result and discussions


0 if Dc Dd , Pd (x) < Td
P(x) = (13) We tested our algorithm on several challenging

Pd (x) if Dc < Dd , Pc (x) Tc video sequences which are publicly available in


0 if Dc < Dd , Pc (x) < Tc the Princeton RGBD Tracking dataset. All of the
sequences are captured from a Kinect sensor, and
where Td and Tc are two thresholds. As shown in the information of both color and depth domain is
Equation (13), the feature with a smaller value of provided. As mentioned at the beginning, we try to
cross-entropy decides the final back-projection, and propose a robust algorithm for tackling challeng-
we call this feature the main feature. In addition, the ing factors along with object tracking, such as blur,
left feature is utilized to remove the possible noise. fast motion, deformation, illumination variation, and
camera jitter, without considering occlusion. Thus,
2.4. Tracking the object several sequences without serious occlusion chal-
lenge are selected as testing sequences.
The core of the CAMShift algorithm [7] is To gauge absolute performance, we compare our
employed here. We choose the previous bounding results to four state-of-the-art trackers including CT
box of the target as the mean shift window. Then the [18], LGT [8], IVT [25] and TLD [19]. All the
tracking location can be calculated by four tackers are based on the scheme of tracking-
M10 M01 by-detection except LGT. Two layers are employed
x= , y= (14) by the LGT tracker. Local patches which represent
M00 M00
K. Hu et al. / A novel object tracking algorithm by fusing color and depth information 1781

the targets geometric deformation in the local layer The overlap score is defined as
are updated by using global visual properties, such as

area ROITi ROIGi
color, shape, and apparent local motion. si =
(17)
area ROITi ROIGi
3.1. Setting parameters where ROITi is the target bounding box in the i-th
frame and ROIGi is the corresponding ground truth
For the proposed algorithm, five object seeds are bounding box. By setting an overlap score r which is
kept during the tracking procedure, thus, in Equa- defined as the minimum overlap ratio, one can decide
tions (1), five ranks are selected, where r1 , r2 , r3 , whether an output is correct or not, the success ratio
r4 , r5 are set as 10%, 15%, 20%, 25%, 30% respec- can be calculated by the following formula:
tively. The parameter T which decides the accuracy !
N 1 if si > r
of the segmentation of the target area is set to 60 mm R= ui N, ui = (18)
in Equation (2). The value of the parameter MAXD i=1 0 otherwise
in Equation (5) depends on the displacement of the where N is the number of frames.
target between adjacent frames. According to the
attribution of the dataset employed in this work, 3.3. Tracking results
MAXD is set to 70 mm. In order to keep enough
information in the fused back-projection map, both The screen captures for some of the clips are
parameters Tc , Td defined in Equation (13) should be shown in Figs. 48. The quantitative plots are given
assigned a relatively low value, all of them are set to in Figs. 9, 10. Details of these video sequences are
0.1 here. Finally, all parameters were kept constant shown in Table 2. A more detailed discussion of the
for all experiments. tracking results is described below.
Hand no occ sequence: This sequence highlights
3.2. Evaluation criteria the challenges of targets deformation, illumination
variation. As shown in Fig. 4, both CT and IVT track-
Two kinds of evaluation criteria are considered. ers fails soon due to that much background area are
The center position error is plotted based on the loca- judged as object model at the phase of tracker initial-
tion error metric and the success is plotted based on ization. The TLD tracker performs better than CT and
the overlap metric. The center position error is on IVT. However, as shown in frame #61, TLD also lost
the base of the Euclidean distance between the cen- the target on account of the serious deformation of
ter location of the tracked target and the manually the hand. On the contrary, the LGT tracker success-
labeled ground truth in each frame. fully tackled the problem of deformation. Overall,

Fig. 4. Screenshots of tracking results of the video sequence used for testing (hand no occ, target is selected in frame #1).
1782 K. Hu et al. / A novel object tracking algorithm by fusing color and depth information

Fig. 5. Screenshots of tracking results of the video sequence used for testing (toy car no, target is selected in frame #1).

Fig. 6. Screenshots of tracking results of the video sequence used for testing (toy no, target is selected in frame #1).

as shown in Fig. 9, our tracker gives the best per- Toy no sequence: Challenges of blur, fast motion
formance. As seen in frame #18, #61, #134, #175, and rotation are presented in this sequence. As shown
our tracker can produce a more accurate bounding in Fig. 6, the CT and IVT trackers have already failed
box. in frame #9 on account of the factors of blur and fast
Toy car no sequence: This sequence presents chal- motion. Both ours and the LGT tracker perform well
lenging target rotation and light changes (lights throughout the sequence. However, as shown in frame
mounted on the car winked sometimes, as seen in #9, we can see from the estimated bounding boxes that
frame #13, #92 in Fig. 5). As shown in Fig. 5, all the the size of the object is often poorly estimated by the
trackers perform well before frame #23. In the course LGT tracker, which leads to failures. That is mainly
of turning, trackers except ours begin to lose the toy because the sudden move of the target, and the update
car. Both the feature model and the model updating of local patched cannot follow such a rapid change.
method employed by CT, LGT, IVT and TLD cannot Zball no1 sequence: This sequence presents the
fit the serious change of the targets appearance and challenges of illumination change, rapid motion,
size, which leads to failures. camera jitter, similar color and depth information.
K. Hu et al. / A novel object tracking algorithm by fusing color and depth information 1783

Fig. 7. Screenshots of tracking results of the video sequence used for testing (zball no1, target is selected in frame #1).

Fig. 8. Screenshots of tracking results of the video sequence used for testing (each target is selected in frame #1). (a) new ye no occ;
(b) wr no1; (c) wdog no1; (d) zcup move 1.

As shown in Fig. 7, the TLD tracker fails soon. An CT tracker lost the target ball, and the LGT tracker
inappropriate size of the bounding box is estimated cannot produce a proper bounding box because of the
by the IVT tracker, and it also fails soon (as seen in sudden movement and similar background. As seen
Fig. 9). As shown in frames #76 and #77 in Fig. 7, in Figs. 7 and 9, our tracker performs best.
when the tracked ball rolled into the shadow of the Other sequences: The tracking results of another
sofa, both fast move and camera jitter happens. The four sequences are given in Figs. 8 and 10. The plot of
1784 K. Hu et al. / A novel object tracking algorithm by fusing color and depth information

Success Plot of hand_no_occ Center Position Error Plot of hand_no_occ


1 250
NeutRGBD NeutRGBD
FCT FCT
0.8 200
LGT LGT

Center Position Error


IVT IVT
Success Rate

0.6 TLD 150 TLD

0.4 100

0.2 50

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 200
Overlap Threshold Frame

Success Plot of toy_car_no Center Position Error Plot of toy_car_no


1 140
NeutRGBD NeutRGBD
FCT 120 FCT
0.8
LGT LGT

Center Position Error


IVT 100 IVT
Success Rate

0.6 TLD TLD


80

60
0.4

40
0.2
20

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120
Overlap Threshold Frame

Success Plot of toy_no Center Position Error Plot of toy_no


1 300
NeutRGBD NeutRGBD
FCT 250 FCT
0.8
LGT LGT
Center Position Error

IVT 200 IVT


Success Rate

0.6 TLD TLD

150
0.4
100

0.2
50

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120
Overlap Threshold Frame

Success Plot of zball_no1 Center Position Error Plot of zball_no1


1 300
NeutRGBD NeutRGBD
FCT 250 FCT
0.8
LGT LGT
Center Position Error

IVT 200 IVT


Success Rate

0.6 TLD TLD

150
0.4
100

0.2
50

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140
Overlap Threshold Frame

Fig. 9. Success and center position error plots of the sequence hand no occ, toy car no, toy no and zball no1.

the mean success rate of all the sequences is shown in Analysis of our tracker: Figs. 9, 10 present the
Fig. 10. As seen in Figs. 8 and 10, the proposed tracker tracking results in terms of center location error
performs best among all the trackers compared in this and success rate. Our tracker achieves much bet-
work. ter results than other trackers. By extracting target
K. Hu et al. / A novel object tracking algorithm by fusing color and depth information 1785

1
Mean Success Plot of Other Sequences in the depth domain is calculated by employing sev-
NeutRGBD eral object seeds. The object seeds are updated during
0.9 FCT
LGT
the whole tracking procedure depending on the fused
0.8
IVT back-projection. The information fusion problem in
TLD
0.7 both color and depth domain is translated into a
0.6 multicriteria decision-making problem. Two kinds of
Success Rate

criteria are proposed and the cross-entropy of SVNSs


0.5
is utilized to tackling the information fusion prob-
0.4
lem. Such a discriminative back-projection leads to
0.3 a more robust and efficient tracker. Experimental
0.2 results on challenging video sequences demonstrate
0.1
that our tracker achieves favorable performance when
compared with several state-of-the-art algorithms. As
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 discussed in this paper, we focus on the tracking task
Overlap Threshold
without serious occlusion. It will be our primary mis-
Fig. 10. Success and center position error plots of the sequence sion to try to tackle the occlusion problem through the
hand no occ, toy car no, toy no and zball no1. RGBD information in future.

Table 2
An overview of the video sequences Acknowledgments
Sequence Target Challenges Frames
hand no occ hand deformation, illumination 196 This work was supported by National Natural Sci-
variation ence Foundation of China under Grant no. 61603258,
toy car no toy car rotation, illumination variation, 103
scale change
the Public Welfare Technology Application Research
toy no toy blur, fast motion, rotation 102 Project of Zhejiang Province in China under Grant no.
zball no1 ball illumination variation, rapid 122 2016C31082, and the scientific research project of
motion, camera jitter Shaoxing University under Grant no. 2014LG1009.
new ye no occ head rolling, appearance change 235
wr no1 rabit rolling, blur, fast motion, 156
appearance change
wdog no1 dog rolling, blur, fast motion, 86 References
appearance chan
zcup move 1 cup scale change 370
[1] A. Adam, E. Rivlin and I. Shimshoni, Robust Fragments-
based Tracking using the Integral Histogram, in: IEEE
Conference on Computer Vision and Pattern Recognition
area in depth domain, the noisy background infor-
(CVPR), New York, 2006, pp. 798805.
mation can be filtered (e.g. hand no occ and toy no), [2] E.J. Almazan and G.A. Jones, Tracking People across Multi-
and a more reliable object model can be achieved ple Non-overlapping RGB-D Sensors, in: IEEE Conference
when illumination or object orientation changes (e.g. on Computer Vision and Pattern Recognition Workshops
(CVPRW), 2013, pp. 831837.
toy car no and zball no1). By employing the multi- [3] A.M. Anter, A.E. Hassanien, M.A.A. ElSoud and M.F.
criteria decision-making method in NS domain (the Tolba, Neutrosophic sets and fuzzy C-means clustering for
key of our tracker), the information fusion facili- improving CT liver image segmentation, Advances in Intel-
tates enhancing the robustness of the back-projection. ligent Systems and Computing 303 (2014), 193203.
[4] B. Babenko, Y. Ming-Hsuan and S. Belongie, Robust object
Challenges of similar information in color and depth tracking with online multiple instance learning, IEEE Trans-
domain can be tackled in all of the sequences. Chal- actions on Pattern Analysis and Machine Intelligence 33
lenges of blur and fast motion are successfully tackled (2011), 16191632.
by using a relative large searching area and robust [5] P. Biswas, S. Pramanik and B.C. Giri, TOPSIS method for
multi-attribute group decision-making under single-valued
back-projection. neutrosophic environment, Neural Computing and Applica-
tions 27(3) (2015), 727737.
[6] F. Bousetouane, L. Dib and H. Snoussi, Improved mean
4. Conclusions shift integrating texture and color features for robust real
time object tracking, The Visual Computer 29 (2013),
155170.
In this paper, we present a new scheme for track- [7] G.R. Bradski, Real time face and object tracking as a com-
ing an object in RGBD domain. A distribution map ponent of a perceptual user interface, in: Proceedings of the
1786 K. Hu et al. / A novel object tracking algorithm by fusing color and depth information

Fourth IEEE Workshop on Applications of Computer Vision, [22] I. Leichter, Mean shift trackers with cross-bin metrics, IEEE
1998, pp. 214219. Transactions on Pattern Analysis and Machine Intelligence
[8] L. Cehovin, M. Kristan and A. Leonardis, Robust visual 34 (2011), 695706.
tracking using an adaptive coupled-layer visual model, [23] J. Liu, Y. Liu, Y. Cui and Y.Q. Chen, Real-time human
IEEE Transactions on Pattern Analysis And Machine Intel- detection and tracking in complex environments using sin-
ligence 35 (2013), 941953. gle RGBD camera, in: IEEE International Conference on
[9] D. Comaniciu, V. Ramesh and P. Meer, Kernel-based Image Processing (ICIP), 2013, pp. 30883092.
object tracking, IEEE Transactions on Pattern Analysis and [24] P. Majumdar, Neutrosophic sets and its applications to deci-
Machine Intelligence 25 (2003), 564577. sion making, in: Adaptation, Learning, and Optimization,
[10] H. Grabner, C. Leistner and H. Bischof, Semi-supervised 2015, pp. 97115.
on-Line Boosting for Robust Tracking, in: European Con- [25] D. Ross, J. Lim, R.-S. Lin and M.-H. Yang, Incremental
ference on Computer Vision (ECCV), D. Forsyth, P. Torr and learning for robust visual tracking, International Journal of
A. Zisserman, eds., Springer Berlin Heidelberg, Marseille, Computer Vision 77 (2008), 125141.
2008, pp. 234247. [26] F. Smarandache, Neutrosophy: Neutrosophic probability,
[11] Y. Guo and A. Sengur, A novel image segmentation algo- set and logic, in, American Research Press, Rehoboth, 1998,
rithm based on neutrosophic similarity clustering, Applied p. 105.
Soft Computing Journal 25 (2014), 391398. [27] S. Song and J. Xiao, Tracking revisited using RGBD camera:
[12] Y. Guo and A. Sengur, NCM: Neutrosophic c-means cluster- Unified benchmark and baselines, in: IEEE International
ing algorithm, Pattern Recognition 48 (2015), 27102724. Conference on Computer Vision, 2013, pp. 233240.
[13] Y. Guo and A. Sengur, A novel 3D skeleton algorithm [28] S. Sridhar, A. Oulasvirta and C. Theobalt, Interactive mark-
based on neutrosophic cost function, Applied Soft Comput- erless articulated hand motion tracking using RGB and
ing Journal 36 (2015), 210217. depth data, in: Proceedings of the IEEE International Con-
[14] J. Han, E.J. Pauwels, P.M. De Zeeuw and P.H.N. De ference on Computer Vision, 2013, pp. 24562463.
With, Employing a RGB-D sensor for real-time tracking of [29] H. Wang, F. Smarandache, Y. Zhang and R. Sunderraman,
humans across multiple re-entries in a smart environment, Single valued neutrosophic sets, Multispace and Multistruc-
IEEE Transactions on Consumer Electronics 58 (2012), ture 4 (2010), 410413.
255263. [30] Y. Wu, J. Lim and M.H. Yang, Online object tracking: A
[15] J. Han, L. Shao, D. Xu and J. Shotton, Enhanced com- benchmark, in: IEEE Conference on Computer Vision and
puter vision with Microsoft Kinect sensor: A review, IEEE Pattern Recognition (CVPR), 2013, pp. 24112418.
Transactions on Cybernetics 43 (2013), 13181334. [31] H. Yang, L. Shao, F. Zheng, L. Wang and Z. Song, Recent
[16] S. Hare, A. Saffari and P.H.S. Torr, Struck: Structured output advances and trends in visual tracking: A review, Neuro-
tracking with kernels, in: Proceedings of the IEEE Interna- computing 74 (2011), 38233831.
tional Conference on Computer Vision, 2011, pp. 263270. [32] J. Ye, Single valued neutrosophic cross-entropy for multi-
[17] X. Hu and P. Mordohai, Evaluation of stereo confi- criteria decision making problems, Applied Mathematical
dence indoors and outdoors, in: IEEE Conference on Modelling 38 (2014), 11701175.
Computer Vision and Pattern Recognition (CVPR), 2010, [33] A. Yilmaz, O. Javed and M. Shah, Object tracking: A survey,
pp. 14661473. Acm Computing Surveys 38 (2006), Article 13.
[18] Z. Kaihua, Z. Lei and Y. Ming-Hsuan, Fast Compres- [34] A. Yilmaz, L. Xin and M. Shah, Contour-based object
sive Tracking, IEEE Transactions on Pattern Analysis and tracking with occlusion handling in video acquired using
Machine Intelligence 36 (2014), 20022015. mobile cameras, IEEE Transactions on Pattern Analysis and
[19] Z. Kalal, K. Mikolajczyk and J. Matas, Tracking-learning- Machine Intelligence 26 (2004), 15311536.
detection, IEEE Transactions on Pattern Analysis and [35] M. Zhang, L. Zhang and H.D. Cheng, A neutrosophic
Machine Intelligence 34 (2012), 14091422. approach to image segmentation based on watershed
[20] E. Karabatak, Y. Guo and A. Sengur, Modified neutrosophic method, Signal Processing 90 (2010), 15101517.
approach to color image segmentation, Journal of Electronic [36] C. Zhu, Video object tracking using SIFT and mean shift,
Imaging 22(1) (2013), 40494068. Master Thesis in Communication Engineering, 2011.
[21] A. Kharal, A neutrosophic multi-criteria decision making
method, New Mathematics and Natural Computation 10
(2014), 143162.

Anda mungkin juga menyukai