Anda di halaman 1dari 14

Received December 5, 2017, accepted February 1, 2018, date of publication February 12, 2018, date of current version April

4, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2804379

Salient Object Detection and Segmentation


via Ultra-Contrast
LIANGZHI TANG , FANMAN MENG, (Member, IEEE), QINGBO WU, (Member, IEEE),
NII LONGDON SOWAH , KAI TAN, AND HONGLIANG LI, (Senior Member, IEEE)
School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Corresponding author: Liangzhi Tang (tlzjay@163.com)
This work was supported in part by the National Natural Science Foundation of China under Grant 61525102, Grant 61601102,
and Grant 61502084).

ABSTRACT Salient object detection aims at finding the most conspicuous objects in an image that highly
catches the user’s attention. The traditional contrast based salient object detection algorithms focus on
highlighting the most dissimilar regions and generally fail to detect complex salient objects. In this paper,
we propose a salient object detection principle from existing contrast based methods: dissimilarity produces
contrast, while contrast leads to saliency. Guided by this principle, we propose a generalized framework to
detect complex salient objects. First, we propose a set of region dissimilarity definitions inspired by diverse
saliency cues. Then, multiple contrast contexts are encoded to derive dissimilarity matrices. Afterwards,
multiple contrast transformations are designed to convert dissimilarity matrices into unified ultra-contrast
features. Finally, these ultra-contrast features are mapped to saliency values through logistic regression. The
proposed framework is capable of flexibly integrating different kinds of region dissimilarity definitions,
region contexts, and contrast transformations. The experimental results demonstrate that our ultra-contrast
based saliency detection method outperforms existing contrast based algorithms in terms of three metrics on
four datasets.

INDEX TERMS Saliency detection, salient object segmentation, ultra-contrast, region dissimilarity.

I. INTRODUCTION depends on the color dissimilarities with all other regions


Saliency detection has drawn a lot of attention in recent in the same image. Fig. 2 shows a set of saliency detec-
years, which is designed to study the ability of human rapid tion results generated by 10 state-of-the-art contrast based
motion understanding [1]. Existing salient object detection salient object detection algorithms, where the salient object
algorithms deal with two problems, human eye fixation, and is composed of four distinct basic objects, the ‘‘person’’,
salient object detection. The second one is the most important the ‘‘clothing’’, the ‘‘hat’’ and the ‘‘bag’’. It can be seen that
research stream, which outputs a saliency map where the all these algorithms only detect and segment a part of this
intensity of each pixel represents its probability of belong- salient object. RC [6] only highlights the ‘‘clothing’’, since it
ing to salient objects. Many computer vision applications utilizes color dissimilarity in a global context to detect salient
benefit from it, such as object recognition [2], [3], image objects and the black color of ‘‘clothing’’ makes it stand
cropping [4], human pose estimation [5], etc. out from other regions. These algorithms [10]–[14] miss the
The common trait of existing contrast based saliency detec- ‘‘bag’’ and ‘‘hat’’ since the ‘‘bag’’ and ‘‘hat’’ touch the image
tion algorithms [1], [6]–[12] relies on the belief that when the boundary and all these algorithm use boundary context to
mean dissimilarity of a region is higher in some context, it is exclude salient objects.
more likely to be a salient region. Such a principle dominates To overcome these problems, state-of-the-art methods pro-
the design of contrast based saliency detection algorithms pose to fuse multiple saliency cues or contexts. The details
in a long time, which limits the development of this system can be found in Table 1, where one or more cues (contexts)
to some extent. Take RC [6] as an example, it computes a are considered. Jiang et al. [15] learn a random forest regres-
region’s color dissimilarities with all other regions (global sor against a set of low-level region features, which comes
context) and treats the mean dissimilarity as the region’s from different feature spaces, such as regions size, local
saliency value, which means the saliency of a region only contrast, responses of filters, etc. Jiang et al. [21] generate

2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
14870 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 6, 2018
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

of four blocks: dissimilarity computation, context selection,


contrast transformation and saliency calculation. In the dis-
similarity computation block, a set of unified dissimilarity
definitions are proposed to encode diverse saliency cues,
which brings us the capability of not only integrating multiple
saliency cues in the same manner but also distinguishing
good cues from bad ones easily. After this, a set of region
dissimilarity matrices are produced, which naturally refer to
global context since all pairwise dissimilarities on all regions
are computed. On the other hand, it is worth noting that
local and boundary contexts can be viewed as special cases
of the global context. Thus, we can derive another two sets
of dissimilarity matrices corresponding to local and bound-
ary contexts respectively from global context without extra
computation time. Then, instead of simply calculating the
mean dissimilarity as the final saliency value, we propose
four contrast transformations to convert these dissimilarity
FIGURE 1. Generalized contrast based salient object detection system
consisting of four blocks: context selection, dissimilarity computation, matrices into a set of novel contrast features, named as ultra-
contrast computation and saliency calculation. contrast features. The proposed contrast transformations are
inspired by the classic contrast definitions to image and the
TABLE 1. Statistics of contexts and saliency cues of existing salient
object detection algorithms. analysis of dissimilarity distribution. Finally, the saliency
map is derived from ultra-contrast features through learning
logistic regression. In summary, the main contributions are
listed as follows:
1) We propose a generalized salient object detection
principle, i.e., dissimilarity produces contrast, while
contrast leads to saliency. Guided by this princi-
ple, a unified salient object detection framework is
proposed.
2) A set of dissimilarity definitions are proposed by
three saliency maps in different ways and fuse these saliency considering both low-level and high-level saliency
maps via element-wise multiplication. They achieve better cues.
results than those methods that only take advantage of single 3) Four contrast transformations are proposed based on
cue or context. However, there exist noticeable drawbacks the statistics of dissimilarity distributions.
in these methods. (a) The fusion method of multiple cues is 4) We illustrate a way to make our method capable of inte-
always designed in a heuristic way, which makes it hard to grating deep learning based features for generalization.
obtain an optimal result or even unexpectedly degrade the per-
formance. (b) There is short of unified and efficient methods II. RELATED WORKS
to encode multiple contexts to detect salient objects. (c) The In this section, we briefly review contrast based salient
difference between dissimilarity, contrast, and saliency is ill- object detection algorithms according to the unified blocks of
considered and it leads to an over-simplified principle, which the proposed framework, i.e., context selection, dissimilarity
calculates the mean dissimilarity as the contrast value and computation and contrast computation.
assigns the contrast value as final saliency value. Fig. 2 also
shows some cases of such methods [12], [13], [15], where A. CONTEXT SELECTION
although simple integration of multiple saliency cues or con- As discussed earlier, there exist three popular contexts: local,
texts works better than a single one, it fails to highlight this global and boundary. Among these three contexts, the local
complex salient object uniformly, which makes it difficult to context is the earliest one used in salient detection algo-
segment. rithms. Itti et al. [1] first propose a center-surround strategy
In this paper, to overcome these drawbacks, we propose to highlight those regions with high local contrast. After
a salient object detection principle, which is dissimilarity that, some works [9], [23] identify salient regions from their
produces contrast and contrast leads to saliency. Our idea local contexts. As for the global context, Cheng et al. [6]
is that the saliency value of a region is encoded in its contrast first compute the mean color dissimilarity with all other
feature vector space, while the corresponding contrast feature regions as the saliency value. Boundary context is proposed
vectors are derived from its pairwise dissimilarity matrices. to search salient regions from non-boundary regions. Many
Guided by this principle, we propose a unified salient object state-of-the-art contrast-based salient object detection algo-
detection framework, as shown in Fig. 1, which consists rithms [10], [12]–[14] are designed based on this idea and

VOLUME 6, 2018 14871


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

FIGURE 2. An example of complex salient object detection and segmentation results produced by the proposed algorithm compared with other
10 state-of-the-arts salient object detection algorithms, where the complex salient object is composed of four distinct basic objects: the ‘‘person’’,
the ‘‘clothing’’, the ‘‘hat’’ and the ‘‘bag’’. From (a) to (n):(a) Input image and corrresponding ground truth mask, (b) RC [6], (c) DSR [13], (d) GS [10],
(e) HSD [9], (f) MC [14], (g) MR [11], (h) SO [12], (i) DRFI [15], (j) BL [16], (k) LPS [17], (l) Ours.

FIGURE 3. The proposed framework for computing ultra-contrast based saliency map. (a) Input image. (b) Over-segmentation. (c) Nine
region dissimilarity matrices, generated by integrating three dissimilarity definitions and three context types. (d) Four contrast
transformations. (e) Ultra-contrast feature vectors. (f) Ultra-contrast based saliency map.

the major differences between them are the mathematical C. CONTRAST TRANSFORMATION
models and post-processing techniques. Besides, the deep Contrast transformation refers to the method, by region dis-
learning based saliency detection works [24], [25] also utilize similarities are transformed to contrast value or saliency
both local and global context to train their models. Thus, value. Most existing works [1], [6], [7], [9], [12], [19], [20],
context is important to saliency detection task and different [23] compute the mean dissimilarity as the contrast value
contexts play different roles. In this paper, we encode three and simply treat this contrast value as the final saliency
contexts in the manner, while only one context needs to be value. Li et al. [30] propose to compute the saliency value
computed. by adding segmentation results. Jiang et al. [15] propose to
train random forest against all kinds of region features to
B. REGION DISSIMILARITY predict the region’s saliency value. Wei et al. [10] adopt a
In this paper, we propose a set of region dissimilarity defi- novel geodesic distance to compute the region dissimilarity
nitions to encode diverse saliency cues. Among all existing and regard it as the saliency value. Tang et al. [31] propose
saliency cues, color is the most frequent one. Most existing to fuse several existing saliency maps into a new saliency
works [1], [6], [8]–[10], [12]–[15], [17], [19], [26]–[28] take map. In this paper, we assert there exists a differentiation
color difference as the most important cue to detect salient between contrast and saliency and devise different contrast
objects. Wang et al. [20] use gradient distribution to compute transformations to extract contrast features from diverse dis-
gradient contrast and combine it with color contrast to gen- similarity matrices.
erate saliency map. Jiang et al. [21] exploit object detection
results as a complementary saliency map to boost perfor- III. GENERALIZED FRAMEWORK
mance. Specifically, they fuse three saliency maps based three This section presents the architecture of our salient object
saliency cues to generate final saliency map via element- detection framework, as shown in Fig. 3. As illustrated
wise multiplication. In addition, Meng et al. [29] exploit in Fig. 3 (b), the first step is to over-segment an input image
inter-image region dissimilarity to assist foreground object into N regions. Next, a set of distinct dissimilarity matrices
segmentation in multiple images. Instead of designing sophis- are computed Fig. 3 (c), where three kinds of dissimilar-
ticated methods to combine multiple saliency cues, we pro- ity definitions and three contexts are shown for illustration
pose a unified architecture (region dissimilarity matrix) to purposes. As introduced earlier, only global dissimilarity
integrate them. matrices are calculated in practice, from which local and

14872 VOLUME 6, 2018


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

boundary dissimilarity matrices are easily derived. After


region dissimilarity computation, nine dissimilarity matrices
feed into contrast transformation block in (d), where four
contrast transformations are shown. Then, an ultra-contrast
feature matrix is generated throughout the whole pipeline,
illustrated in (e). It is worth noting that the number of con-
texts, region dissimilarity definitions, and contrast transfor-
mations can be freely combined.
The ultra-contrast based saliency map is finally obtained
by transforming ultra-contrast feature matrix into saliency
values as follows:
1
s(i) = P(ri ) = T
, i ∈ ri (1)
1 + e−ω u(ri )
where i refers to an image pixel, ri is the region which pixel i FIGURE 4. Transformation from saliency cues to region dissimilarity
belongs to, ω is the learned weights of a logistic regression definitions. Red and green squares refer to the regions belonging to
salient objects and backgrounds, respectively. (a) Color dissimilarity,
model, and u(ri ) is the ultra-contrast feature of region ri . (b) Texture dissimilarity, (c) Edge dissimilarity, (d) Object dissimilarity.
According to the architecture of the proposed framework,
the p-th element of ultra-contrast feature vector up (ri ) is
determined by three key components: context R(ri ), which all the four situations at the same time. Thus, in the follow-
refers to the regions used to compute the dissimilarity, region ing, we propose a set of diverse dissimilarity definitions to
dissimilarity type Disn (ri , rj ) between ri and rj ∈ R(ri ), encode different saliency cues. On the other hand, instead
and contrast transformation F, which determines how the of utilizing sophisticated tricks to integrate multiple saliency
ultra-contrast is derived from region’s dissimilarity. Thus, cues, we propose to produce one N × N matrix from each
the ultra-contrast feature extraction framework can be defined dissimilarity equally.
as follows:
1) SPATIAL DISSIMILARITY
up (ri ) = Fm (Disn (ri , rj )), rj ∈ R(ri ) (2)
As the first dissimilarity measure, we consider the simple
The dimension of ultra-contrast feature is equal to Kr ∗ spatial distance between regions. It’s clearly seen that regions
Kd ∗ Ks , where Kr , Kd , and Ks represent the number of ri and rj are more dissimilar when they are farther from each
context types, the number of region dissimilarity types and other. Thus, the spatial dissimilarity is defined as:
the contrast transformations types. Next, we will present the
DisS (ri , rj ) = d(l(ri ), l(rj )) (3)
three key blocks in details.
where d(·, ·) is the Euclidean distance, and l(ri ) is the centre
IV. ULTRA-CONTRAST FEATURE location of region ri .
A. REGION DISSIMILARITY MATRIX - Dis(ri , rj )
In this subsection, we shall present the details of dissimilarity 2) COLOR DISSIMILARITY
calculation block by encoding saliency cue into region dis- Intuitively, if two regions in an image have similar color,
similarity definition. Intuitively, it is hard to determine which the probability that they belong to the same object is higher
single saliency cue makes a region conspicuous in an image. than those don’t. The color dissimilarity between region ri
To illustrate this point explicitly, we show four kinds of and region rj is defined as bellows
saliency cues in Fig. 4, where red and green squares represent
salient and background regions respectively. In Fig. 4 (a), DisC (di , dj ) = d(c(ri ), c(rj )) (4)
only color makes the egg distinct from other eggs, while
where d(·, ·) is color distance function.
Fig. 4 (b) shows an example where a region is salient because
its texture is different from other regions. Two kinds of high-
3) TEXTURE DISSIMILARITY
level saliency cues are shown in Fig. 4 (c) and (d). The
first one (c) is referred as edge saliency, which means that For simplicity, the texture dissimilarity is computed as the dis-
if there exists a strong object edge then two regions lying tance between feature vectors that can depict region texture.
on different sides of such an edge likely belong to objects DisT (di , dj ) = d(t(ri ), t(rj )) (5)
and backgrounds respectively. The last one (d) is known as
object saliency, where we identify a region as salient because where t(ri ) is the texture descriptor of region i, which includes
it is a semantic object and here the semantic object refers the mean value of gradient magnitude and gradient direc-
to the person in the image. Based on these considerations, tion, and the histogram of gradient magnitude and gradient
it is evident that none of these saliency cues can deal with direction.

VOLUME 6, 2018 14873


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

4) EDGE DISSIMILARITY image. In this paper, we integrate three existing contexts in


From Fig. 4 (c), we can see that two regions are dissimi- our framework. Specifically, if a global context is applied,
lar when there is a salient edge between them. Therefore, we obtain an N ∗ N dissimilarity matrix for one dissimilarity
we define our first high-level dissimilarity definition, called type. The local context refers to the neighborhood regions
edge dissimilarity. To compute edge dissimilarity, we first of a region and it generates an N ∗ NL dissimilarity matrix,
need to obtain an edge map using edge detection algo- where NL is the number of neighborhood regions. Similarly,
rithm [32], which outputs a probability edge map, denoted boundary region produces an N ∗ NB dissimilarity matrix,
as E, where each pixel has a probability assignment for being where NB refers to the number of boundary regions. In this
an edge pixel. The edge dissimilarity is computed as follows: work, we define a region as boundary region if any of its
P contained pixels lie on any image boundary.
p∈P(ri ,rj ) E(p) After dissimilarity calculation, a set of N × N dissimilarity
DisE (ri , rj ) = (6)
|P(ri , rj )| matrices are generated. It is evident that these matrices cor-
where P(ri , rj ) refers as the pixel set containing all pixels respond to global dissimilarity matrices since it contains all
which lies on the edge of ri and rj , | · | stands for cardinality pairs of region dissimilarity in an image. To integrate three
operation, and E(p) represents the probability of a pixel p existing context types, which are local, global, and boundary
treated as an edge pixel. The numerator and the denominator contexts. It is worth noting that the local dissimilarity matrix
in equation (6) can be regarded as edge strength and edge can be derived from the global dissimilarity matrix once the
length between ri and rj , respectively. This definition can be local neighborhood system is defined. It holds the same for
interpreted as when edge strength is larger, the probability of boundary context. Thus, although three contexts are encoded
obtaining a salient edge between these two regions is higher, in our framework, we only need to compute the global dis-
hence edge strength is proportional to region dissimilarity. similarity matrix in practice.
When edge length is longer, two regions are more likely to
be adjacent so this should be inversely proportional to region C. CONTRAST TRANSFORMATION - F (Dis(ri , rj ))
dissimilarity. After dissimilarity calculation block, a set of distinct dis-
similarity matrices are obtained. In this section, we aim to
5) OBJECT DISSIMILARITY transform region dissimilarity matrix into region contrast fea-
From Fig. 4(d), it can be noted that all the above dissimi- ture that corresponds to the proposed contrast transformation.
larity definitions are difficult to depict in a situation where The design of contrast transformation must obey three rules:
two regions belong to the same face or the same person. 1) Consistency. No matter how the size of region dissimilar-
To express high-level region dissimilarity, we exploit an off- ity matrix varies, the length of contrast feature vector must
the-shelf object detection algorithm, selective search [33], be consistent. 2) Diversity. Different transformations must
which produces hundreds of windows with high probability be distinct from each other. 3) Discriminability. Must have
belonging to objects. It can be observed that if two regions lie the ability to discriminate salient region from backgrounds.
in the same object detection window, there is a great chance To ensure the diversity of the proposed contrast transfor-
both regions are parts of a salient object. We first define a mations, we take advantage of distinct classic image con-
region ri lies in a detection box bn , denoted as ri ∈ bn , if the trast definitions [34]. The consistency of extracted contrast
center pixel of the regions lies within the detection box. Then features are guaranteed by the unified matrix operation on
object dissimilarity is defined as follows: the dissimilarity matrix. To show a good discriminability,
we analyze the characteristics of the dissimilarity distribution
DisO (ri , rj ) and unveil the probability features of salient and background
PN
/ bn ]) ∨ ([ri ∈
n=1 ([ri ∈ bn ] ∧ [rj ∈ / bn ] ∧ [rj ∈ bn ]) regions respectively. To better explain these transformations,
=
N we show an example of dissimilarity distribution of both
(7) salient regions and backgrounds in Fig. 5, where the red
and blue colors denote the salient regions and backgrounds
where [] is an indicator function, ∧ is a logical conjunction
respectively. In the following, we present the details of our
operator, ∨ is a logical disjunction operator, and N is the
proposed diverse contrast transformations.
number of the detection boxes in an image. This definition
demonstrates that we count the frequency that region ri and
region rj are in the same box. 1) AVERAGE CONTRAST
Pavel et al. [35] formulated the root mean square contrast of
B. REGION CONTEXT - R(ri ) an image I , whichqis defined as the standard deviation of the
pixel intensities: N1 N 2
P
In most cases, context refers to a set of regions from which the i=1 (Ii − I ) , where Ii and I refer to
saliency value of a specific region is inferred. For example, the intensity of the ith pixel and the average intensity of all
global context refers to all regions in an image. When the pixels respectively in the image I , and N is the number of
global context is used to detect salient regions, the saliency pixels in the image. Essentially, this contrast definition refers
of a specific region depends on all other regions in the same to the mean value of intensity dissimilarities and it is simple

14874 VOLUME 6, 2018


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

was addressed explicitly by Hess [37] defined in the Fourier


domain as C(u, v) = 2A(u,v)DC , where A(u, v) is the amplitude
of the Fourier transform of an image, u and v are the horizon-
tal and vertical spatial frequency coordinates, respectively,
and DC is the zero-frequency component. The amplitude of
Fourier transform of an image can reflect luminance fluctua-
tion in an image, and the ‘‘DC term’’ corresponding to zero
frequency, that represents the average brightness. Therefore,
the whole definition is akin to depicting luminance fluctua-
tion in an image. On the other hand, from Fig. 5 (c) we can
see, the dissimilarity distribution of background region is flat-
ter than that of salient regions. Thus, the standard variance of
salient region’s dissimilarity distribution is obviously larger
FIGURE 5. Dissimilarity distributions of salient and background region, than that of background region. Inspired by this, we propose
which are denoted with red and blue colors respectively. our third contrast to characterize the fluctuation of dissimi-
larity distribution. For simplicity and consistency, we define
Variance Contrast to compute the variance of region dissimi-
to switch this definition from image level to region level. larities, abbreviated as CV , as follows:
On the other hand, it can be seen from Fig. 5 that the mean 1 X
dissimilarity of salient region (0.5 ∼ 0.6) is higher than that CV (ri ) = (Dis(ri , rj ) − Dis(R(ri )))2 (10)
|R(ri )|
of background region (0.2 ∼ 0.3). Thus, it has the ability to rj ∈R(ri )
discriminate salient object from backgrounds. In conclusion,
we define our Average Contrast to represent the average value 4) SKEWNESS CONTRAST
of dissimilarity distribution, denoted as CA , of a region ri as One of the oldest luminance contrast statistics, Weber con-
follows: trast [38], is defined as I −I b
Ib , where I and Ib represent the
luminance of the objects and the luminance of the imme-
1 X
CA (ri ) = Dis(ri , rj ) (8) diately adjacent background respectively. It is commonly
|R(ri )| used in cases where small objects are presented on a large
rj ∈R(ri )
uniform background, i.e. the average luminance is approxi-
where |R(ri )| refers to the number of related regions depend- mately equal to the background luminance. Inspired by this,
ing on the type of context. there are no uniform backgrounds in most images. In order
to encode the uniform backgrounds, we analyze the statistics
2) MICHELSON CONTRAST of dissimilarity distribution. We observed that if a region
−Imin
The Michelson contrast [36] is defined as IImax max +Imin
, where belongs to backgrounds, there must be a large number of sim-
Imax and Imin represent the highest and lowest luminance in ilar regions in the image so that most of these dissimilarities
an image respectively. The denominator of the Michelson should be small. We show this observation in Fig. 5, where the
formula Imax + Imin is a normalization factor for each image, histograms of dissimilarities for three superpixels are plotted,
and the numerator is similar to the maximum and minimum and one belongs to salient objects while another two belong
difference in a whole image. In practice, this definition is to backgrounds. Specifically, the red color histogram refers
designed to represent the intensity range of an image. On the to the dissimilarity distribution of a salient region, while
other hand, the dissimilarity range of a salient region is dis- the purple color histogram refers to that of a background
tinct from backgrounds, because there are hundreds of kinds region. From these histograms, we notice that the histogram
of salient objects while the number of background classes of background region’s dissimilarities is a more positive skew
is scared, such as the sky. Based on these considerations, distribution, where most dissimilarities are approximate to
we propose to depict the dissimilarity range of a region, zero.
so that we can learn which dissimilarity range salient objects Motivated by this, we define our Skewness Contrast,
should fall in. To be specific, we define Michelson Contrast, denoted as CK to depict the skewness of region’s dissimilar-
denoted as CM , to represent the dissimilarity range of a region ities as follows:
as follows:
E(Dis(ri , rj ) − Dis(R(ri ))3 )
max Dis(ri , rj ) − min Dis(ri , rj ) CK (ri ) = q (11)
rj ∈R(ri ) (Dis(ri , rj ) − Dis(R(ri )) )
{rj ∈R(ri )} {rj ∈R(ri )} 1 P 2
CM (ri ) = (9) |R(ri )|
max Dis(ri , rj ) + min Dis(ri , rj )
{rj ∈R(ri )} {rj ∈R(ri )} where E is the expectation operator.

3) VARIANCE CONTRAST 5) VISUALIZATION OF ULTRA-CONTRAST


The issue of contrast of complex scenes at different spatial To visualize these contrast transformations, we generate
frequencies in the context of image processing and perception saliency maps using different contrast transformations

VOLUME 6, 2018 14875


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

FIGURE 6. Visualization of diverse contrast transformations.

separately in Fig. 6. It can be seen that different transfor- method is that it doesn’t make the most of the saliency map,
mations (b)(c)(d)(e) highlight different salient regions. For as it only uses the saliency map to locate the salient object
example, Average Contrast (b) highlights the bigger salient roughly. As a result of that, it makes the initial saliency
object while ignores the smaller one, because the smaller map less important. In the following, we present a unified
one has lower mean region dissimilarity compared with the algorithm to produce segmentation mask from saliency map
bigger one. In contrast, Michelson Contrast (d) and Variance by making the most use of the saliency map. Specifically,
Contrast (e) highlight the smaller salient object. Obviously, the saliency value of a pixel is directly treated as the prob-
these results demonstrate the diversity of the proposed trans- ability of it belonging to a salient object. Besides that, only
formations. At last, our final ultra-contrast saliency map the smooth operation is considered.
(f) combining all transformations successfully detect all parts By combining these two factors, we construct the objective
of salient objects. function as follows:
X X
V. TRAINING E(y) = U (yi ) + λ V (yi , yj ) (14)
In this section, we present how to train a logistic regression i∈V i,j∈E
model against the proposed ultra-contrast features. The train- where yi is the segmentation variable of pixel i.
ing samples are extracted by the following procedures: we The first term U (si ) is denoted as data potential, designed
first define an indicator image Ii for region ri as a binary mask to constrain that the final segmentation si is close to the
where label 1 represents the pixels that are in the region ri and saliency map, therefore it is computed as follows:
0 for other pixels. Then we compute an overlap score [39]
between indicator image Ii and corresponding ground-truth U (yi ) = −log(si yi + (1 − si )(1−yi ) ) (15)
binary mask S as follows:
T where si is the saliency value of pixel i.
| Ii S |
s(Ii , S) = S (12) The second term is denoted as the edge-preserving
| Ii S | smooth potential, designed to smooth pixels in salient
If the overlap score is above τ , it’s treated as a positive objects and backgrounds separately, and here we use the
sample. It is worth noting that when τ is set to 0, there Kolmogorov and Zabin [42] interaction energy.
exists an unbalanced situation between the number of positive 2
samples and negative samples. The training will fail if we V (yi , yj ) = d(i, j)[yi 6= yj ]e−β|fi −fj | (16)
directly use the native loss function of logistic regression [40],
because of the symmetry with which it penalizes two types of where d(i, j) represents the distance between region i and j,
mistakes equally. Similar to [40], our solution is to modify the [.] refers to the indicator function, β refers to the parameter
training loss. Specifically, we multiply a parameter β to the that weights the feature distance, and fi is the color feature
loss of positive samples. vector of region i.
Note that the objective function is a submodular binary
Nn
1 discrete optimization, and it can be minimized using graph
log(1 + eω xi )
X T
ω∗ (β) = argminω ωT ω + cuts [42].
2
i=1
Np
X Tx VII. EXPERIMENTS
+β log(1 + e−ω j ) (13) To evaluate the effectiveness of the proposed method,
j=1
we design three groups of comparative experiments. First,
where Np and Nn refer to the number of positive samples and we compare the performance of the proposed ultra-contrast
negative samples respectively. features with DRFI’s [15] features through different seg-
mentation level. Second, salient object detection experiments
VI. UNIFIED SALIENT OBJECT SEGMENTATION are conducted in comparison with state-of-the-art algorithms.
In saliency cut [6], they use an initial saliency map to produce Third, salient object segmentation experiments are conducted
a rough saliency mask, then iterative GrabCut [41] are run to investigate the performance of the proposed method in
to get the final segmentation result. The disadvantage of this object segmentation task.

14876 VOLUME 6, 2018


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

TABLE 2. Performance of the proposed ultra-contrast feature compared with DRFI [15] feature on four datasets across different segmentation levels.
(a) MSRA-B. (b) PASCAL-S. (c) ECSSD. (d) DUT-OMRON.

To analyze the components of the proposed framework, B. PERFORMANCE COMPARISON


we design a group of ablation analysis experiments. Fur- 1) ULTRA-CONTRAST FEATURE VS DRFI FEATURE
thermore, the efficiency analysis of the proposed method is We first compare the performance of our ultra-contrast fea-
presented. At last, we show that our ultra-contrast features ture with DRFI [15] feature across different segmentation lev-
are complementary to deep learning based features. els, which is the leading contrast-based algorithm over seven
datasets reported in [45]. For a fair comparison, we apply
A. SETUP the same segmentations. Besides, we don’t use any post-
1) DATASETS processing techniques, such as spatial smoothing. Three
We use four salient object detection datasets: standard metrics [45], maximal F-measure(MAXF), adaptive
MSRA-B [19], PASCAL-S [43], ECSSD [9] and DUT- threshold F-measure(ADAPF) and MAE, are used to evaluate
OMRON [11]. MSRA-B contains 5000 images in total and the performances and the results are reported in Table 2,
we follow their [15] splitting rule to train our model, where where UC refers to our proposed ultra-contrast feature and
2500, 500 and 2000 images are chosen as training, validation, regsz refers to the minimum region size set in SLIC.
and test set respectively. PASCAL-S contains 850 images, It can be seen from Table 2, our UC outperforms DRFI on
which is used to assess the performance of algorithms over four datasets in terms of three metrics across all segmentation
images with multiple objects with high back-ground clutter. levels on four datasets, which proves the effectiveness of the
DUT-OMRON consists of 5,168 challenging images, and proposed ultra-contrast features. Specifically, our UC is about
most of them have relatively cluttered backgrounds. ECSSD 9%, 8%, and 8% better than DRFI in terms of the ADAPF,
comprises 1000 complex images. MAXF, and MAE on MSRA-B, when regsz is set to 20.
On the other hand, the performances of our UC are more
2) PARAMETERS robust than DRFI when the segmentations scale changes. For
We set regionsize of SLIC [44] to [15, 18, 20, 23, 25, 30, 35, example, the difference between the highest MAE (11.84%)
40, 45, 50, 60, 70, 80] to obtain multi-scale segmentations, and the lowest MAE (11.58%) of our UC on MSRA-B is
where the number of segmentations is the same as DRFI [15]. within 1% when regsz is 20. In contrast, the MAE of DRFI
The λ in energy function (14) is set to 20 and the trade-off dramatically changes from 20.31% and 12.79%. The reason is
parameter β in equation (13) is set to 1.5. that DRFI feature vector is generated by simply concatenating
various plain region features, such as region area, region color
3) IMPLEMENTATION DETAILS dissimilarity and the response of filters. On the contrary,
To compute color dissimilarity matrices, we adopt multi- three unified blocks in the proposed framework make our UC
ple region color features: average value and histogram in feature more robust across different segmentation scales.
RGB, Lab, and HSV color spaces respectively, which pro-
duce 6 dissimilarity matrices. Average value and histogram 2) SALIENT OBJECT DETECTION RESULTS
of gradient magnitudes and gradient directions are used to For salient object detection experiment, four standard mea-
compute texture dissimilarity, which produces 4 dissimi- surements [45], i.e., precision-recall (PR) curve, maximal
larity matrices. In total, the dimension of the final ultra- F-measure (MAXF), adaptive threshold F-measure (ADAPF)
contrast feature vector is 156, which is equal to 13 × 3 × 4 and mean absolute error (MAE) are adopted. We com-
((the number of dissimilarity matrices) × (the number of pare our ultra-contrast based algorithm, denoted as UC,
contexts) × (the number of contrast transformations)). with 10 state-of-the-art contrast-based algorithms: SF [8],

VOLUME 6, 2018 14877


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

TABLE 3. Saliency detection results of the proposed method in terms of three metrics on four public datasets in compared with 10 state-of-the-art
algorithms. For each metric, the top three results are shown in red, green and blue, respectively. (a) MSRA-B. (b) PASCAL-S. (c) ECSSD. (d) DUT-OMRON.

FIGURE 7. Precision recall curves of the proposed ultra-contrast based algorithm compared with other 10 state-of-the-art saliency detection
algorithms on four datasets. From left to right : (a) MSRA-B, (b) PASCAL-S, (c) ECSSD, (d) DUT-OMRON.

LPS [17], RC [6], DSR [13], MC [14], GS [10], MR [11], all other saliency maps into our energy function (14) to
SO [12], DRFI [15], HSD [9], where DRFI [15] is the leading produce the segmentation mask. The salient object segmen-
algorithm over all seven datasets reported in [45]. tation experiment is conducted on MSRA-B [19]. The most
The MAXF, ADAPF and MAE scores on four datasets are well-known segmentation evaluation metric, intersection-
shown in Table 3 and the corresponding PR curves are plotted over-union (IoU) score [39] is adopted, which is denoted
in Fig. 7. In Fig. 7, our UC consistently outperforms these as So .
algorithms in terms of both precision and recall at a signif- Table 4 shows the results of our UC compared with other
icant margin on four datasets, while DRFI takes the second 10 state-of-the-art algorithms: SF [8], RC [6], LPS [17],
position. It proves the effectiveness of the proposed algorithm DSR [13], GS [10], MC [14], MR [11], SO [12], HSD [9],
comparing with state-of-the-art contrast-based algorithms. DRFI [15] and another salient object segmentation algo-
Observing from Table 3, our UC also consistently outper- rithm SaliencyCut [6]. It can be seen that the mean over-
forms these algorithms in terms of three metrics on four lap score of our UC is 75.53%, which is 5% higher than
datasets. On MSRA-B, our UC is the only method whose the best salient object segmentation algorithm, i.e., Saliency
MAE is below 10%. Cut [6].

3) SALIENT OBJECT SEGMENTATION RESULTS 4) QUALITATIVE RESULTS


In order to compare with other salient object detection Fig. 8 illustrates a set of salient object segmentation examples
algorithms in terms of segmentation performance, we input produced by our method compared with the aforementioned

14878 VOLUME 6, 2018


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

FIGURE 8. Visual comparison of the proposed method UC with other state-of-the-art contrast based methods. From left to
right: (a) source image, (b) RC [6], (c) DSR [13], (d) GS [10], (e) HSD [9], (f) MC [14], (g) MR [11], (h) SO [12], (i) DRFI [15], (j) LPS [17],
(k) SaliencyCut [6], (l) UC, (m) Ground truth mask.

TABLE 4. Results of salient object segmentation of the proposed method compared with other 11 algorithms on MSRA-B.

TABLE 5. Quantitative result of ablation experiment regarding different regions dissimilarities on MSRA-B.

algorithms. The first group denoted with green color border 1) REGION DISSIMILARITY
presents an input image containing a simple salient object. The results of ablation analysis regarding region dissimilarity
It can be seen that most methods have the ability to seg- are presented in Table 5. From this table, first of all, we can
ment simple salient objects accurately. The second group see that all the proposed region dissimilarities contribute to
denoted with blue color border contains seven images con- the final result and this justify the choice of utilizing multiple
taining complex salient objects. As can be seen from this region dissimilarity types. For color dissimilarity, there is
group, except our UC, most contrast-based methods fail to about 1% drop in terms of MAXF when removing any kind
detect these complex salient objects. Take the person in the of color dissimilarity. This result proves the effectiveness of
fifth row of this group as an illustration, where the ‘‘face’’, using multiple color channels. Removing texture dissimilar-
‘‘cloth’’ and the ‘‘bottle’’ are highlighted by the proposed ity also leads to a 1% decrease. As one of our high-level
object dissimilarity, texture dissimilarity, and edge dissimilar- region dissimilarities, object dissimilarity, causes a moderate
ity, respectively. In contrast, the color of the ‘‘cloth’’ is similar decrease. The edge and spatial dissimilarities can be seen
to its backgrounds, thus RC [6] misses it. On the other hand, as complementary cues since there is a relatively smaller
because this ‘‘person’’ lies on the bounder of the image, these decrease by removing them separately.
methods [11], [13], [14] based on boundary context miss the The visual comparisons of different region dissimilarity
boundary part of the salient object. types are shown in Fig. 9, where one column corresponds
to one dissimilarity, mRGB and mGradMag are chosen to
C. ABLATION ANALYSIS represent the color and textures dissimilarity respectively.
The proposed framework is capable of integrating different From the second column we can see that, when only spa-
types of region dissimilarities, contrast transformations, and tial dissimilarity is used, the algorithm always segment the
contexts. In order to survey the effectiveness of these com- objects lying in the center of an image. Take the first row
ponents, we conduct ablation experiments on MSRA-B by of Fig. 9 as an illustration for color dissimilarity, where
removing one component each time. Since there are three key the goal is to segment a yellow flower from cluster back-
factors in our framework, i.e., region dissimilarity, contrast grounds. In RGB color space, the pure yellow color is
transformation and context encoding, we evaluate one factor [255 255 0], and the background color ranges from dark green
at a time while keeping other factors unchanged. to black, which contains a very low red component, because

VOLUME 6, 2018 14879


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

value is strongly influenced by its average dissimilarity with


other regions. For CM (Michelson contrast) and CV (Variance
contrast), the final MAXF scores both lose about 1% when
removing them separately. From these results, we can see
that their union significantly enhances their separate perfor-
mances and this proves the idea that the proposed contrast
transformations are complementary to each other.
In order to study the effectiveness of the proposed con-
trast transformations intuitively, we show some examples of
salient object segmentation using each contrast transforma-
FIGURE 9. Visual comparison of the proposed framework regarding
different region dissimilarities. From left to right: (a) Source image, tion separately in Fig. 10. From the first row, only Average
(b) Spatial, (c) RGB, (d) Texture, (e) Object, (f) Edge, (g) All, (h) Ground Contrast segments the salient object exactly while other trans-
Truth. The green and red frames denote successes and failures
respectively.
formations are disturbed by the cluttered illuminations in the
backgrounds. From a statistical point of view, the mean value
of a probability distribution is more robust to disturbances and
noises than variance and skewness of that. In this example,
of this, the RGB color distance is very critical for segment-
the cluttered lights can be regarded as disturbances to salient
ing it from backgrounds. Though the gradient dissimilarity
regions, therefore, only Average Contrast suppresses them.
performs poorly on this dataset, in some special cases,
In probability theory and statistics, skewness and variance are
the gradient dissimilarity is very useful to segment textured
more sensitive to distinctive samples than the mean value.
salient objects from smooth backgrounds. One of our high-
Such characteristics are just right for highlighting distinc-
level region dissimilarity, object dissimilarity, has already
tive parts of a complex salient object. In the second row,
shown its efficacy in quantitative results, and here we take the
we show an example where only Skewness Contrast segment
third row for demonstration, the salient object in this picture
the distinctive part (the trousers) of the complex salient object
is a baby, who consists of several parts, the white color head,
(the person) successfully. This proves the idea that Skewness
the yellow clothes and the white feet, mRGB dissimilarity
Contrast can be used to highlight some distinctive parts of a
not only highlight the baby but also highlight the light which
complex salient object. An example of total failure is shown
has the similar color, and edge dissimilarity is strongly biased
in the last row, where all of our methods detect different parts
by the edge generated from the skyline. Only when object
of the apple. We argue that the ground truth annotation of this
similarity treats the baby as a whole object, we obtain the
image is very ambiguous.
best result. For our last high-level cue, edge dissimilarity,
it always highlights strong-edged objects, which is shown in
the fourth row. We also show an example in the last row where 3) CONTEXT
almost all of our region dissimilarities misses the person at Table 7 shows the ablation experiment regarding different
the right bottom corner of the image. Concretely, the spatial contexts. It can be observed that the union of three con-
dissimilarity misses the person because it can’t detect the texts outperforms each context separately and it justifies the
salient object lying on the border. All other dissimilarities choice of encoding different contexts in the same framework.
only highlight the central bus. A limitation of the proposed Besides, we can see that the boundary context achieves the
method is that it can’t predict the priority level of salient best result among these three types.
objects. Thus, it may fail to highlight all salient objects when
the salient objects in an image have different saliency levels.

2) CONTRAST TRANSFORMATION
To prove the effectiveness of our proposed contrast transfor-
mations, we conduct the ablation experiment by removing
one contrast transformation at a time. The results are shown in
table 6 and it can be seen that all contrast transformations do
contribute to the final result. Concretely, removing CA (Aver-
age contrast) causes about 2% decrease in the final result in
terms of MAXF, which demonstrates that a region’s saliency

TABLE 6. Quantitative result of ablation experiment regarding contrast


transformations on MSRA-B.
FIGURE 10. Visual comparison of the proposed framework regarding
different contrast transformations. From left to right: (a) Source image,
(b) Average Contrast, (c) Skewness Contrast, (d) Michelson Contrast,
(e) Variance Contrast, (f) All, (g) Ground Truth. The green and red frames
denote successes and failures respectively.

14880 VOLUME 6, 2018


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

D. EFFICIENCY ANALYSIS
To analyze the efficiency of the proposed framework,
we briefly illustrate the implementation flowchart in Fig. 12,
which can be divided into three categories. The first one is
the pre-computation procedure including the extraction of
superpixel map, edge map, and objectness map. The second
and third categories are region dissimilarity computation and
contrast transformation respectively.
The experiments are run on the single thread of an Intel
i5 CPU of 2.60GHz and the code are written in MAT-
LAB. All these results are reported in Table 8. First of all,
it takes less than 0.001s to compute each of four contrast
FIGURE 11. Visual comparison of the proposed framework regarding transformations, because it only involves simple operations,
different different contexts. From left to right: (a) Source image,
(b) Global, (c) Local, (d) Boundary, (e) All, (f) Ground Truth. The green and such as mean computation, variance computation of a small
red frames denote successes and failures respectively. matrix. Even there are dozens of dissimilarity matrices it only
TABLE 7. Quantitative result of ablation experiment regarding different
needs 0.001 seconds plus the number of dissimilarity matrix.
contexts on MSRA-B. In summary, it takes around 3.5 seconds to compute a saliency
map from the input.

E. COMPLEMENTARY TO DEEP FEATURES


In this section, we will show that our ultra-contrast features
Fig. 11 exhibits several examples of salient object segmen- can be utilized to improve the performance of deep learning
tation when the context is set to only one of them. From the based saliency detection model. In [24], they learn a DCNN
first row we can see that, when a salient object occupies a model to predict the saliency value of a region. We first fetch
larger part than backgrounds in an image, the algorithm failed out its pre-trained features of the concat7 layer preceding the
with context only set to global or local. The second and third final output layers, whose dimension is 8192. Then fine-tune
row shows successful examples when only global and local is the logistic regression models with and without our ultra-
used. In the second row, only global context treats the leaves contrast features. For the sake of generalization capability,
of the flowers as backgrounds, because it has lower global the results are reported on PASCAL-S dataset. Since the
contrast. A successful example when only the local context is dimension of deep feature is much larger than that of our
used is shown in the third row, where the kettle has distinctive feature (8192 vs 204), it might not be optimal to directly
local region dissimilarity because its local region contains concatenate these two features. Thus, we implement another
two white cups while it has less dissimilarity when compared two feature concatenation strategies. Concretely, we use PCA
to global and boundary regions. We also show a failed case to reduce the dimension of deep features to 200 and 1000,
in our examples in the last row, where the polar bear doesn’t which are denoted MDL200 and MDL1k respectively. From
have distinctive region dissimilarities in all the proposed three Fig. 13, we can see that the proposed ultra-contrast features
context types. This example illustrates the limitation of the improve the performance of deep features in all the three
proposed method, and it may fail when a salient object is combinations, which proves that the proposed ultra-contrast
similar to its global, local and boundary regions. features are complementary to deep learning based features.

FIGURE 12. Implementation process of the proposed framework.

TABLE 8. Elapsed time of all proposed modules.

VOLUME 6, 2018 14881


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

[9] Q. Yan, L. Xu, J. Shi, and J. Jia, ‘‘Hierarchical saliency detection,’’ in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1155–1162.
[10] Y. Wei, F. Wen, W. Zhu, and J. Sun, ‘‘Geodesic saliency using background
priors,’’ in Proc. Eur. Conf. Comput. Vis., 2012, pp. 29–42.
[11] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, ‘‘Saliency detection
via graph-based manifold ranking,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit., Jun. 2013, pp. 3166–3173.
[12] W. Zhu, S. Liang, Y. Wei, and J. Sun, ‘‘Saliency optimization from robust
background detection,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
nit., Jun. 2014, pp. 2814–2821.
[13] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, ‘‘Saliency detection via
dense and sparse reconstruction,’’ in Proc. IEEE Int. Conf. Comput. Vis.,
Dec. 2013, pp. 2976–2983.
[14] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang, ‘‘Saliency detection
via absorbing Markov chain,’’ in Proc. IEEE Int. Conf. Comput. Vis.,
Dec. 2013, pp. 1665–1672.
[15] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, ‘‘Salient
FIGURE 13. Performances of integrating our proposed ultra-contrast object detection: A discriminative regional feature integration approach,’’
features into deep learning based features.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013,
pp. 2083–2090.
[16] N. Tong, H. Lu, X. Ruan, and M.-H. Yang, ‘‘Salient object detec-
VIII. CONCLUSIONS tion via bootstrap learning,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2015, pp. 1884–1892.
In this paper, we focus on solving the problem that exists [17] H. Li, H. Lu, Z. Lin, X. Shen, and B. Price, ‘‘Inner and inter label
in contrast-based salient object detection algorithms: miss propagation: Salient object detection in the wild,’’ IEEE Trans. Image
some parts of complex salient objects. To achieve this goal, Process., vol. 24, no. 10, pp. 3176–3186, Oct. 2015.
[18] S. Goferman, L. Zelnik-Manor, and A. Tal, ‘‘Context-aware saliency
we propose a unified salient object detection framework, detection,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 10,
which includes three main blocks: dissimilarity definition, pp. 1915–1926, Oct. 2012.
contrast transformation, and context selection. The dissim- [19] T. Liu et al., ‘‘Learning to detect a salient object,’’ IEEE Trans. Pattern
Anal. Mach. Intell., vol. 33, no. 2, pp. 353–367, Feb. 2011.
ilarity definition block is designed to integrate multiple and
[20] K. Wang, L. Lin, J. Lu, C. Li, and K. Shi, ‘‘PISA: Pixelwise image saliency
diverse saliency cues. Then, the contrast transformation block by aggregating complementary appearance contrast measures with edge-
is utilized to transform dissimilarity matrix into region-level preserving coherence,’’ IEEE Trans. Image Process., vol. 24, no. 10,
pp. 3019–3033, Oct. 2015.
ultra-contrast features. Finally, the ultra-contrast is trans-
[21] P. Jiang, H. Ling, J. Yu, and J. Peng, ‘‘Salient region detection by UFO:
formed to saliency values. The experimental results show Uniqueness, focusness and objectness,’’ in Proc. IEEE Int. Conf. Comput.
that the proposed ultra-contrast saliency detection framework Vis., Dec. 2013, pp. 1976–1983.
significantly outperforms existing contrast-based algorithms. [22] X. Li, Y. Li, C. Shen, A. Dick, and A. van den Hengel, ‘‘Contextual
hypergraph modeling for salient object detection,’’ in Proc. IEEE Int. Conf.
Furthermore, we show that deep learning based features Comput. Vis., Dec. 2013, pp. 3328–3335.
can be integrated into our framework, and the experimental [23] Y.-F. Ma and H.-J. Zhang, ‘‘Contrast-based image attention analysis
results demonstrate that the proposed ultra-contrast features by using fuzzy growing,’’ in Proc. ACM Int. Conf. Multimedia, 2003,
pp. 374–381.
are complementary to deep learning based features. [24] R. Zhao, W. Ouyang, H. Li, and X. Wang, ‘‘Saliency detection by
multi-context deep learning,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
REFERENCES Recognit., Jun. 2015, pp. 1265–1274.
[25] L. Wang, H. Lu, X. Ruan, and M.-H. Yang, ‘‘Deep networks for saliency
[1] L. Itti, C. Koch, and E. Niebur, ‘‘A model of saliency-based visual attention detection via local estimation and global search,’’ in Proc. IEEE Conf.
for rapid scene analysis,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3183–3192.
no. 11, pp. 1254–1259, Nov. 1998. [26] Y. Ren, Z. Wang, and M. A. Xu, ‘‘Learning-based saliency detection of
[2] S. Belongie, G. Mori, and J. Malik, ‘‘Matching with shape contexts,’’ in face images,’’ IEEE Access, vol. 5, pp. 6502–6514, 2017.
Statistics and Analysis of Shapes. Boston, MA, USA: Birkhäuser, 2006, [27] C. Scharfenberger, A. G. Chung, A. Wong, and D. A. Clausi, ‘‘Salient
pp. 81–105. region detection using self-guided statistical non-redundancy in natural
[3] A. Rabinovich, A. Vedaldi, and S. J. Belongie, ‘‘Does image segmen- images,’’ IEEE Access, vol. 4, pp. 48–60, 2016.
tation improve object categorization?’’ Dept. Comput. Sci. Eng., Univ. [28] H. Du, Z. Liu, H. Song, L. Mei, and Z. Xu, ‘‘Improving RGBD saliency
California, San Diego, San Diego, CA, USA Tech. Rep. CS2007-0908, detection using progressive region classification and saliency fusion,’’
2007. IEEE Access, vol. 4, pp. 8987–8994, 2016.
[4] L. Marchesotti, C. Cifarelli, and G. Csurka, ‘‘A framework for visual [29] F. Meng, H. Li, Q. Wu, B. Luo, and K. N. Ngan, ‘‘Weakly supervised
saliency detection with applications to image thumbnailing,’’ in Proc. IEEE part proposal segmentation from multiple images,’’ IEEE Trans. Image
Int. Conf. Comput. Vis., Sep./Oct. 2009, pp. 2232–2239. Process., vol. 26, no. 8, pp. 4019–4031, Aug. 2017.
[5] H. Jiang, ‘‘Human pose estimation using consistent max covering,’’ [30] H. Li, F. Meng, and K. N. Ngan, ‘‘Co-salient object detection
IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1911–1918, from multiple images,’’ IEEE Trans. Multimedia, vol. 15, no. 8,
Sep. 2011. pp. 1896–1909, Dec. 2013.
[6] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu, ‘‘Global [31] L. Tang, H. Li, and T. Chen, ‘‘Extract salient objects from natural images,’’
contrast based salient region detection,’’ in Proc. IEEE Conf. Comput. Vis. in Proc. IEEE Int. Symp. Intell. Signal Process. Commun. Syst., Dec. 2010,
Pattern Recognit., Jun. 2011, pp. 409–416. pp. 1–4.
[7] Y. Zhai and M. Shah, ‘‘Visual attention detection in video sequences [32] P. Dollár and C. L. Zitnick, ‘‘Fast edge detection using structured forests,’’
using spatiotemporal cues,’’ in Proc. ACM Int. Conf. Multimedia, 2006, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8, pp. 1558–1570,
pp. 815–824. Aug. 2015.
[8] F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung, ‘‘Saliency filters: [33] K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders,
Contrast based filtering for salient region detection,’’ in Proc. IEEE Conf. ‘‘Segmentation as selective search for object recognition,’’ in Proc. IEEE
Comput. Vis. Pattern Recognit., Jun. 2012, pp. 733–740. Int. Conf. Comput. Vis., Nov. 2011, pp. 1879–1886.

14882 VOLUME 6, 2018


L. Tang et al.: Salient Object Detection and Segmentation via Ultra-Contrast

[34] E. Peli, ‘‘Contrast in complex images,’’ J. Opt. Soc. Amer. A, Opt. Image QINGBO WU received the B.E. degree in educa-
Sci., vol. 7, no. 10, pp. 2032–2040, 1990. tion of applied electronic technology from Hebei
[35] M. Pavel, G. Sperling, T. Riedl, and A. Vanderbeek, ‘‘Limits of visual Normal University in 2009 and the Ph.D. degree
communication: The effect of signal-to-noise ratio on the intelligibility of in signal and information processing from the
American Sign Language,’’ J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 4, University of Electronic Science and Technology
no. 12, pp. 2355–2365, 1987. of China in 2015. In 2014, he was a Research
[36] A. A. Michelson, Studies in Optics. North Chelmsford, MA, USA: Courier Assistant with the Image and Video Processing
Corporation, 1995.
Laboratory, Chinese University of Hong Kong.
[37] R. F. Hess, ‘‘Contrast-coding in amblyopia. II. On the physiological basis
From 2014 to 2015, he was a Visiting Scholar
of contrast recruitment,’’ Proc. Roy. Soc. London B, Biol. Sci., vol. 217,
no. 1208, pp. 331–340, Feb. 1983. with the Image & Vision Computing Laboratory,
[38] R. F. Hess and E. R. Howell, ‘‘The threshold contrast sensitivity function University of Waterloo, Waterloo, ON, Canada. He is currently a Lecturer
in strabismic amblyopia: Evid ence for a two type classification,’’ Vis. Res., with the School of Electronic Engineering, University of Electronic Science
vol. 17, no. 9, pp. 1049–1055, 1977. and Technology of China. His research interests include image/video coding,
[39] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and quality evaluation, and perceptual modeling and processing.
A. Zisserman, ‘‘The Pascal visual object classes (VOC) challenge,’’ Int.
J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Sep. 2009.
[40] Z. Ren and G. Shakhnarovich, ‘‘Image segmentation by cascaded region
agglomeration,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2013, pp. 2011–2018. NII LONGDON SOWAH received the B.Sc.
[41] C. Rother, V. Kolmogorov, and A. Blake, ‘‘Grabcut: Interactive foreground degree in computer engineering from the Kwame
extraction using iterated graph cuts,’’ ACM Trans. Graph., vol. 23, no. 3, Nkrumah University of Science and Technology
pp. 309–314, 2004. in 2009 and the M.Sc. degree in communication
[42] V. Kolmogorov and R. Zabin, ‘‘What energy functions can be minimized engineering from the University of Electronic Sci-
via graph cuts?’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 2, ence and Technology of China in 2012, where
pp. 147–159, Feb. 2004. he is currently pursuing the Ph.D. degree with
[43] Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille, ‘‘The secrets of salient the Intelligent Visual Information Processing and
object segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Communication Laboratory. His research interests
Jun. 2014, pp. 280–287. include object tracking and image clustering.
[44] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, ‘‘SLIC
superpixels compared to state-of-the-art superpixel methods,’’ IEEE Trans.
Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282, Nov. 2012.
[45] A. Borji, M.-M. Cheng, H. Jiang, and J. Li, ‘‘Salient object detec-
tion: A benchmark,’’ IEEE Trans. Image Process., vol. 24, no. 12, KAI TAN received the M.A.Sc. degree from Shan-
pp. 5706–5722, Dec. 2015. dong Normal University in 2013. He is currently
pursuing the Ph.D. degree in signal and informa-
tion processing with the University of Electronic
Science and Technology of China, under the super-
vision of Prof. H. Li. His research interests include
visual attention, image recognition, crowd analy-
LIANGZHI TANG received the B.Sc. and M.Sc. sis, neural network, and deep learning.
degrees from the School of Electronic Engineer-
ing, University of Electronic Science and Tech-
nology of China, in 2008 and 2011, respectively,
where he is currently pursuing the Ph.D. degree,
under the supervision of Prof. H. Li.
His research interests include saliency detec- HONGLIANG LI (M’06–SM’11) received the
tion, object segmentation, and deep convolutional Ph.D. degree in electronics and information engi-
neural network. neering from Xian Jiaotong University, Xian,
China, in 2005. From 2005 to 2006, he was with
the Visual Signal Processing and Communication
Laboratory, The Chinese University of Hong Kong
(CUHK), Hong Kong, as a Research Associate.
From 2006 to 2008, he was a Post-Doctoral Fellow
with the Visual Signal Processing and Communi-
FANMAN MENG (S‘12–M‘14) received the cation Laboratory, CUHK. He is currently a Pro-
Ph.D. degree in signal and information process- fessor with the School of Electronic Engineering, University of Electronic
ing from the University of Electronic Science and Science and Technology of China, Chengdu, China. He has authored or
Technology of China, Chengdu, China, in 2014. co-authored numerous technical articles in well-known international jour-
From 2013 to 2014, he was with the Division of nals and conferences. He is a Co-Editor of the book Video Segmentation
Visual and Interactive Computing, Nanyang Tech- and its Applications (Springer, 2011). His research interests include image
nological University, Singapore, as a Research segmentation, object detection, image and video coding, visual attention, and
Assistant. He is currently an Associate professor multimedia communication systems. He was involved in many professional
with the School of Electronic Engineering, Uni- activities. He served as a TPC member in a number of international con-
versity of Electronic Science and Technology of ferences, e.g., ICME 2013, ICME 2012, ISCAS 2013, PCM 2007, PCM
China, Chengdu, Sichuan, China. He has authored or co-authored numerous 2009, and VCIP 2010. He served as the Technical Program Co-Chair for
technical articles in well-known international journals and conferences. His VCIP2016 and ISPACS 2009, the General Co-Chair of the ISPACS 2010,
research interests include image segmentation and object detection. He is the Publicity Co-Chair of the IEEE VCIP 2013, and the Local Chair of the
a member of the IEEE CAS society. He received the Best Student Paper IEEE ICME 2014. He is a member of the Editorial Board of the Journal on
Honorable Mention Award for the 12th Asian Conference on Computer Visual Communications and Image Representation, and an Area Editor of
Vision, Singapore, in 2014, and the Top 10% paper award in the IEEE Signal Processing: Image Communication, Elsevier Science.
International Conference on Image Processing, Paris, France, in 2014.

VOLUME 6, 2018 14883

Anda mungkin juga menyukai