a r t i c l e i n f o a b s t r a c t
Article history: Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reduc-
Received 5 December 2016 ing the number of primitives for subsequent processing steps. As of these properties, superpixel algo-
Revised 26 February 2017
rithms have received much attention since their naming in 2003 (Ren and Malik, 2003). By today, pub-
Accepted 29 March 2017
licly available superpixel algorithms have turned into standard tools in low-level vision. As such, and
Available online 2 April 2017
due to their quick adoption in a wide range of applications, appropriate benchmarks are crucial for algo-
Keywords: rithm selection and comparison. Until now, the rapidly growing number of algorithms as well as varying
Superpixels experimental setups hindered the development of a unifying benchmark. We present a comprehensive
Superpixel segmentation evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark focussing on fair compar-
Image segmentation ison and designed to provide new insights relevant for applications. To this end, we explicitly discuss
Perceptual grouping parameter optimization and the importance of strictly enforcing connectivity. Furthermore, by extend-
Benchmark
ing well-known metrics, we are able to summarize algorithm performance independent of the number
Evaluation
of generated superpixels, thereby overcoming a major limitation of available benchmarks. Furthermore,
we discuss runtime, robustness against noise, blur and affine transformations, implementation details as
well as aspects of visual quality. Finally, we present an overall ranking of superpixel algorithms which
redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the cor-
responding implementations which themselves are made publicly available as part of our benchmark at
http://www.davidstutz.de/projects/superpixel-benchmark/.
© 2017 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.cviu.2017.03.007
1077-3142/© 2017 Elsevier Inc. All rights reserved.
2 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
(among others Achanta et al., 2012; van den Bergh et al., 2013b; in Section 3, we present the evaluated superpixel algorithms. In
Grundmann et al., 2010) have been adapted to videos and image Section 4 we discuss relevant datasets and introduce the used met-
volumes – a survey and comparison of some of these so-called rics in Section 5. Then, Section 6 briefly discusses problems related
supervoxel algorithms can be found in Xu and Corso (2012). to parameter optimization before we present experimental results
In view of this background, most authors do not make an in Section 7. We conclude with a short summary in Section 8.
explicit difference between superpixel algorithms and over-
segmentation algorithms, i.e. superpixel algorithms are usually 2. Related work
compared with oversegmentation algorithms and the terms have
been used interchangeably (e.g. Levinshtein et al., 2009; Neubert Our efforts towards a comprehensive comparison of available
and Protzel, 2012; Schick et al., 2012). Veksler et al. (2010) distin- superpixel algorithms is motivated by the lack thereof within the
guish superpixel algorithms from segmentation algorithms running literature. Notable publications in this regard are Schick et al.
in “oversegmentation mode”. More recently, Neubert and Protzel (2012), Achanta et al. (2012), Neubert and Protzel (2012), and
(2013) distinguish superpixel algorithms from oversegmentation Neubert and Protzel (2013). Schick et al. (2012) introduce a metric
algorithms with respect to their behavior on video sequences. for evaluating the compactness of superpixels, while Achanta et al.
In general, it is very difficult to draw a clear line between su- (2012) as well as Neubert and Protzel (2012) concentrate on using
perpixel algorithms and oversegmentation algorithms. Several known metrics. Furthermore, Neubert and Protzel evaluate the
oversegmentation algorithms were not intended to generate su- robustness of superpixel algorithms with respect to affine transfor-
perpixels, nevertheless, some of them share many characteristics mations such as scaling, rotation, shear and translation. However,
with superpixel algorithms. We use the convention that superpixel they do not consider ground truth for evaluating robustness. More
algorithms offer control over the number of generated superpixels recently, Neubert and Protzel (2013) used the Sintel dataset (Butler
while segmentation algorithms running in “oversegmentation et al., 2012) to evaluate superpixel algorithms based on optical
mode” do not. This covers the observations made by Veksler et al. flow in order to assess the stability of superpixel algorithms in
and Neubert and Protzel. video sequences.
In general, most authors (e.g. Achanta et al., 2012; Levinshtein Instead of relying on an application independent evaluation of
et al., 2009; Lui et al., 2011; Schick et al., 2012) agree on the superpixel algorithms, some authors compared the use of super-
following requirements for superpixels: pixel algorithms for specific computer vision tasks. Achanta et al.
(2012) use the approaches of Gould et al. (2008) and Gonfaus et al.
– Partition. Superpixels should define a partition of the image,
(2010) to assess superpixel algorithms as pre-processing step for
i.e.superpixels should be disjoint and assign a label to every
semantic segmentation. Similarly, Strassburg et al. (2015) evaluate
pixel.
superpixel algorithms based on the semantic segmentation ap-
– Connectivity. Superpixels are expected to represent connected
proach described in Tighe and Lazebnik (2010). Weikersdorfer et al.
sets of pixels.
(2012) use superpixels as basis for the normalized cuts algorithm
– Boundary Adherence. Superpixels should preserve image
(Shi and Malik, 20 0 0) applied to classical segmentation and com-
boundaries. Here, the appropriate definition of image bound-
pare the results with the well-known segmentation algorithm by
aries may depend on the application.
Arbeláez et al. (2011). Koniusz and Mikolajczyk (2009), in contrast,
– Compactness, Regularity and Smoothness. In the absence of
evaluate superpixel algorithms for interest point extraction.
image boundaries, superpixels should be compact, placed
In addition to the above publications, authors of superpixel
regularly and exhibit smooth boundaries.
algorithms usually compare their proposed approaches to exist-
– Efficiency. Superpixels should be generated efficiently.
ing superpixel algorithms. Usually, the goal is to demonstrate
– Controllable Number of Superpixels. The number of generated
superiority with regard to specific aspects. However, used pa-
superpixels should be controllable.
rameter settings are usually not reported, or default parameters
Some of these requirements may be formulated implicitly, are used, and implementations of metrics differ. Therefore, these
e.g.Lui et al. (2011) require that superpixels may not lower the experiments are not comparable across publications.
achievable performance of subsequent processing steps. Achanta Complementing the discussion of superpixel algorithms in the
et al. (2012) even require superpixels to increase the performance literature so far, and similar to Schick et al. (2012), Achanta et al.
of subsequent processing steps. Furthermore, the above require- (2012) and Neubert and Protzel (2012), we concentrate on known
ments should be fulfilled with as few superpixels as possible (Lui metrics to give a general, application independent evaluation of
et al., 2011). superpixel algorithms. However, we consider minimum/maximum
Contributions. We present an extensive evaluation of 28al- as well as standard deviation in addition to metric averages in or-
gorithms on 5 datasets regarding visual quality, performance, der assess the stability of superpixel algorithms as also considered
runtime, implementation details and robustness to noise, blur by Neubert and Protzel (2012); 2013). Furthermore, we explicitly
and affine transformations. In particular, we demonstrate the document parameter optimization and strictly enforce connectivity
applicability of superpixel algorithms to indoor, outdoor and to ensure fair comparison. In contrast to Neubert and Protzel
person images. To ensure a fair comparison, parameters have been (2012), our robustness experiments additionally consider noise
optimized on separate training sets; as the number of generated and blur and make use of ground truth for evaluation. Finally, we
superpixels heavily influences parameter optimization, we addi- render three well-known metrics independent of the number of
tionally enforced connectivity. Furthermore, to evaluate superpixel generated superpixels allowing us to present a final ranking of
algorithms independent of the number of superpixels, we propose superpixel algorithms.
to integrate over commonly used metrics such as Boundary Recall
(Martin et al., 2004), Undersegmentation Error (Achanta et al., 3. Algorithms
2012; Levinshtein et al., 2009; Neubert and Protzel, 2012) and
Explained Variation (Moore et al., 2008). Finally, we present a In our comparison, we aim to discuss popular algorithms with
ranking of the superpixel algorithms considering multiple metrics publicly available implementations alongside less-popular and
and independent of the number of generated superpixels. more recent algorithms for which implementations were partly
Outline. In Section 2 we discuss important related work re- provided by the authors. To address the large number of super-
garding the comparison of superpixel algorithms and subsequently, pixel algorithms, we find a rough categorization of the discussed
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 3
algorithms helpful. Based on the categorization by Achanta et al. Graph-based. Graph-based algorithms treat the image as
(2012) – who presented (to the best of our knowledge) the first undirected graph and partition this graph based on edge-weights
and only categorization of superpixel algorithms – we categorized which are often computed as color differences or similarities.
algorithms according to their high-level approach. We found that The algorithms differ in the partitioning algorithm, for example
this categorization provides an adequate abstraction of algorithm FH, ERS and POISE exhibit a bottom-up merging of pixels into
details, allowing to give the reader a rough understanding of superpixels, while NC and CIS use cuts and PB uses elimination
the different approaches, while being specific enough to relate (Carr and Hartley, 2009).
categories to experimental results, as done in Section 7. For each
algorithm, we present the used acronym, the reference and its
number of citations1 . In addition, we provide implementation
details such as the programming language, the used color space,
the number of parameters as well as whether the number of
superpixels, the compactness and the number of iterations (if
applicable) are controllable.
Watershed-based. These algorithms are based on the waterhed
algorithm (W) and usually differ in how the image is pre-
processed and how markers are set. The number of superpixels is
determined by the number of markers, and some watershed-based
superpixel algorithms offer control over the compactness, for
example WP or CW.
however, compactness usually is not. Often, these algorithms use superpixels is controllable, compactness can be controlled and the
edge information: PF uses discrete image gradients and TPS uses iterations can usually be aborted at any point.
edge detection as proposed in Dollár and Zitnick (2013).
4. Datasets
Table 1
Basic statistics of the used datasets: the total number of images, the number of training and
test images and the size of the images (averaged per dimension). The number of images for
the SUNRGBD dataset excludes the images from the NYUV2 dataset. For the NYUV2 and SUN-
RGBD datasets, training and test images have been chosen uniformly at random if necessary.
Note that the odd numbers used for the NYUV2 dataset are for no special reason.
Overall, high Rec represents better boundary adherence with achievable accuracy for segmentation using superpixels as pre-
respect to the ground truth boundaries, i.e.higher is better. In processing step:
practice, a boundary pixel in S is matched to an arbitrary boundary 1
pixel in G within a local neighborhood of size (2r + 1 ) × (2r + 1 ), ASA (G, S ) = max{|S j ∩ Gi |}; (7)
N Gi
with r being 0.0025 times the image diagonal rounded to the next Sj
integer (e.g. r = 1 for the BSDS500 dataset). Intra-Cluster Variation (ICV) (Benesova and Kottman, 2014) com-
Undersegmentation Error (UE) (Achanta et al., 2012; Levinshtein putes the average variation within each superpixel:
et al., 2009; Neubert and Protzel, 2012) measures the “leakage” of
superpixels with respect to G and, therefore, implicitly also mea- (I (xn ) − μ(S j ))2
1 xn ∈S j
sures boundary adherence. Here, “leakage” refers to the overlap of ICV (S ) = ; (8)
superpixels with multiple, nearby ground truth segments. The orig-
|S | Sj
|S j |
inal formulation by Levinshtein et al. (2009) can be written as Mean Distance to Edge (MDE) (Benesova and Kottman, 2014) re-
fines Rec by also considering the distance to the nearest boundary
1 |S j | − |Gi |
S j ∩Gi =∅
UELevin (G, S ) = (2) pixel within the ground truth segmentation:
|G| |Gi |
Gi 1
MDE (G, S ) = distS (xn ) (9)
where the inner term represents the “leakage” of superpixel Sj N
xn ∈B ( G )
with respect to G. However, some authors (Achanta et al., 2012;
where B(G) is the set of boundary pixels in G, and distS is a
Neubert and Protzel, 2012) argue that Eq. (2) penalizes superpixels
distance transform of S.
overlapping only slightly with neighboring ground truth segments
and is not constrained to lie in [0, 1]. Achanta et al. (2012) suggest 5.1. Expressiveness and chosen metrics
to threshold the “leakage” term of Eq. (2) and only consider those
superpixels Sj with a minimum overlap of 100 5
· |S j |. Both van den Due to the large number of available metrics, we examined
Bergh et al. (2013a) and Neubert and Protzel (2012) propose for- their expressiveness in order to systematically concentrate on few
mulations not suffering from the above drawbacks. In the former, relevant metrics. We found that UE tends to correlate strongly
1 with ASA which can be explained by Eqs (4) and (7), respec-
UEBergh (G, S ) = S j − argmaxS j ∩ Gi , (3) tively. In particular, simple calculation shows that ASA strongly
N Gi
Sj resembles (1 − UE ). Surprisingly, UENP does not correlate with
UELevin suggesting that either both metrics reflect different aspects
each superpixel is assigned to the ground truth segment
of superpixels, or UELevin unfairly penalizes some superpixels
with the largest overlap, and only the “leakage” with respect
as suggested in Achanta et al. (2012) and Neubert and Protzel
to other ground truth segments is considered. Therefore,
(2012). Unsurprisingly, MDE correlates strongly with Rec which
UEBergh corresponds to (1 − ASA ) – with ASA being Achievable
can also be explained by their respective definitions. In this sense,
Segmentation Accuracy as described below. The latter,
MDE does not provide additional information. Finally, ICV does
1 not correlate with EV which may be attributed to the missing
UENP (G, S ) = min{|S j ∩ Gi |, |S j − Gi |}, (4) normalization in Eq. (8) when compared to EV. This also results
N
Gi S j ∩Gi =∅
in ICV not begin comparable across images as the intra-cluster
is not directly equivalent to (1 − ASA ), however, UENP and ASA are variation is not related to the overall variation within the image.
still strongly correlated as we will see later. All formulations have As of these considerations, we concentrate on Rec, UE and EV for
in common that lower UE refers to less “leakage” with respect to the presented experiments. Details can be found in Appendix C.1.
the ground truth, i.e.lower is better. In the following we use UE ≡
5.2. Average Miss Rate, Average Undersegmentation Error and
UENP .
Average Unexplained Variation
Explained Variation (EV) (Moore et al., 2008) quantifies the
quality of a superpixel segmentation without relying on ground
As the chosen metrics inherently depend on the number of
truth. As image boundaries tend to exhibit strong change in color
superpixels, we seek a way of quantifying the performance with
and structure, EV assesses boundary adherence independent of
respect to Rec, UE and EV independent of K and in a single plot
human annotions. EV is defined as
per dataset. In order to summarize performance over a given
Sj |S j |(μ(S j ) − μ(I ))2 interval [K min , K max ], we consider MR = (1 − Rec ), UE and
EV (S ) = (5) UV = (1 − EV ). Here, the first corresponds to the Boundary Miss
xn (I (xn ) − μ (I ))
2
Rate (MR) and the last, Unexplained Variation (UV), quantifies
where μ(Sj ) and μ(I) are the mean color of superpixel Sj and the the variation in the image not explained by the superpixels. We
image I, respectively. As result, EV quantifies the variation of the use the area below these curves in [K min , K max ] = [20 0, 520 0] to
image explained by the superpixels, i.e.higher is better. quantify performance independent of K. In Section 7.2, we will see
Compactness (CO) (Schick et al., 2012) has been introduced by that these metrics appropriately summarize the performance of
Schick et al. (2012) to evaluate the compactness of superpixels: superpixel algorithms. We denote these metrics by Average Miss
Rate (AMR), Average Undersegmentation Error (AUE) and Average
1 4π A ( S j )
CO (G, S ) = |S j | . (6) Unexplained Variation (AUV) – note that this refers to an average
N P (S j ) over K. By construction (and in contrast to Rec and EV), lower
Sj
AMR, AUE and AUV is better, making side-by-side comparison
CO compares the area A(Sj ) of each superpixel Sj with the area across datasets easy.
of a circle (the most compact 2-dimensional shape) with same
perimeter P(Sj ), i.e.higher is better. 6. Parameter optimization
While we will focus on Rec, UE, EV and CO, further no-
table metrics are briefly discussed in the following. Achievable For the sake of fair comparison, we optimized parameters on
Segmentation Accuracy (ASA) (Lui et al., 2011) quantifies the the training sets depicted in Table 1. Unfortunately, parameter op-
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 7
Rec
K
had to rely on discrete grid search, jointly optimizing Rec and UE,
i.e.minimizing (1 − Rec ) + UE . As Rec and UE operate on different 0.4
representations (boundary pixels and superpixel segmentations, 0.2
0.6
respectively), the additive formulation ensures that algorithms
0
balance both metrics. For example, we observed that using a parameter parameter
multiplicative formulation allows superpixel algorithms to drive
Fig. 2. K and Rec on the training set of the BSDS500 dataset when varying parame-
(1 − Rec ) towards zero while disregarding UE. We optimized ters strongly influencing the number of generated superpixels of: LSC; CIS; VC; CRS;
parameters for K ∈ {40 0, 120 0, 360 0} and interpolated linearly in and PB. The parameters have been omitted and scaled for clarity. A higher number
between (however, we found that for many algorithms, parameters of superpixels results in increased Rec. Therefore, unnoticed superpixels inherently
are consistent across different values of K ). Optimized parameters complicate fair comparison.
also include compactness parameters and the number of iterations
as well as the color space. We made sure that all algorithms at
least support RGB color space for fairness. In the following, we BSDS500
briefly discuss the main difficulties encountered during param- 1 10
eter optimization, namely controlling the number of generated
superpixels and ensuring connectivity. 0.8
log t
Rec
6.1. Controlling the number of generated superpixels
0.6
CO
metrics AMR, AUE and AUV. Furthermore, we present experiments highly irregular and non-smooth superpixels. These observations
regarding implementation details as well as robustness against also justify the separate consideration of compactness, regularity
noise, blur and affine transformations. Finally, we give an overall and smoothness to judge visual quality. While the importance of
ranking based on AMR and AUE. compactness, regularity and smoothness may depend on the ap-
plication at hand, these properties represent the trade-off between
7.1. Qualitative abstraction from and sensitivity to low-level image content which
is inherent to all superpixel algorithms.
Visual quality is best determined by considering compactness, In conclusion, we find that the evaluated path-based and
regularity and smoothness on the one hand and boundary adher- density-based algorithms as well as oversegmentation algorithms
ence on the other. Here, compactness refers to the area covered show inferior visual quality. On the other hand, clustering-based,
by individual superpixels (as captured in Eq. (6)); regularity cor- contour evolution and iterative energy optimization algorithms
responds to both the superpixels’ sizes and their arrangement; mostly demonstrate good boundary adherence and some provide
and smootness refers to the superpixels’ boundaries. Figs. 5 and a compactness parameter, e.g.SLIC, ERGC and ETPS. Graph-based
6 show results on all datasets. We begin by discussing boundary algorithms show mixed results – algorithms such as FH, CIS and
adherence, in particular with regard to the difference between PB show inferior boundary adherence, while ERS, RW, NC and
superpixel and oversegmentation algorithms, before considering POISE exhibit better boundary adherence. However, good boundary
compactness, smoothness and regularity. adherence, especially regarding details in the image, often comes
The majority of algorithms provides solid adherence to im- at the price of lower compactness, regularity and/or smoothness
portant image boundaries, especially for large K. We consider the as can be seen for ETPS and SEEDS. Furthermore, compactness,
woman image – in particular, the background – and the caterpillar smoothness and regularity are not necessarily linked and should
image in Fig. 5. Algorithms with inferior boundary adherence are be discussed separately.
easily identified as those not capturing the pattern in the back-
ground or the silhouette of the caterpillar: FH, QS, CIS, PF, PB, TPS, 7.1.1. Compactness
TP and SEAW. The remaining algorithms do not necessarily capture CO measures compactness, however, does not reflect regularity
all image details, as for example the woman’s face, but important or smoothness; therefore, CO is not sufficient to objectively judge
image boundaries are consistently captured. We note that of the visual quality. We consider Fig. 8, showing CO on the BSDS500 and
three evaluated oversegmentation algorithms, i.e.EAMS, FH and NYUV2 datasets, and we observe that CO correctly measures
QS, only EAMS demonstrates adequate boundary adherence. Fur- compactness. For example, WP, TP and CIS, exhibiting high CO,
thermore, we observe that increasing K results in more details also present very compact superpixels in Figs. 5 and 6. However,
being captured by all algorithms. Notable algorithms regarding these superpixels do not necessarily have to be visually appealing,
boundary adherence include CRS, ERS, SEEDS, ERGC and ETPS. i.e.may lack regularity and/or smoothness. This can exemplarily be
These algorithms are able to capture even smaller details such as seen for TPS, exhibiting high compactness bur poor regularity, or
the coloring of the caterpillar or elements of the woman’s face. PB showing high compactness but inferior smoothness. Overall, we
Compactness strongly varies across algorithms and a compact- find that CO should not be considered isolated from a qualitative
ness parameter is beneficial to control the degree of compactness evaluation.
as it allows to gradually trade boundary adherence for compact-
ness. We consider the caterpillar image in Fig. 5. TP, RW, W, 7.1.2. Depth
and PF are examples for algorithms not providing a compactness Depth information helps superpixels resemble the 3D-structure
parameter. While TP generates very compact superpixels and within the image. Considering Fig. 6, in particular both images
RW tends to resemble grid-like superpixels, W and PF gener- for DASP and VCCS, we deduce that depth information may be
ate highly non-compact superpixels. In this regard, compactness beneficial for superpixels to resemble the 3D-structure of a scene.
depends on algorithm and implementation details (e.g.grid-like ini- For example, when considering planar surfaces (e.g. the table)
tialization) and varies across algorithms. For algorithms providing in both images from Fig. 6 for DASP, we clearly see that the
control over the compactness of the generated superpixels, we find superpixels easily align with the surface in a way perceived as
that parameter optimization has strong impact on compactness. 3-dimensional. For VCCS, this effect is less observable which may
Examples are CRS, LSC, ETPS and ERGC showing highly irregular be due to the compactness parameter.
superpixels, while SLIC, CCS, VC and WP generate more compact
superpixels. For DASP and VCCS, requiring depth information, 7.2. Quantitative
similar observations can be made on the kitchen image in Fig. 6.
Inspite of the influence of parameter optimization, we find that a Performance is determined by Rec, UE and EV. In contrast to
compactness parameter is beneficial. This can best be observed in most authors, we will look beyond metric averages. In particular,
Fig. 7, showing superpixels generated by SLIC and CRS for differ- we consider the minimum/maximum as well as the standard devi-
ent degrees of compactness. We observe that compactness can be ation to get an impression of the behavior of superpixel algorithms.
increased while only gradually sacrificing boundary adherence. Furthermore, this allows us to quantify the stability of superpixel
We find that compactness does not necessarily induce reg- algorithms as also considered by Neubert and Protzel (2013).
ularity and smoothness; some algorithms, however, are able to Rec and UE offer a ground truth dependent overview to assess
unite compactness, regularity and smoothness. Considering the the performance of superpixel algorithms. We consider Fig. 9(a)
sea image in Fig. 5 for CIS and TP, we observe that compact su- and (b), showing Rec and UE on the BSDS500 dataset. With respect
perpixels are not necessarily arranged regularly. Similarly, compact to Rec, we can easily identify top performing algorithms, such as
superpixels do not need to exhibit smooth boundaries, as can be ETPS and SEEDS, as well as low performing algorithms, such as
seen for PB. On the other hand, compact superpixels are often FH, QS and PF. However, the remaining algorithms lie closely to-
generated in a regular fashion, as can be seen for many algorithms gether in between these two extremes, showing (apart from some
providing a compactness parameter such as SLIC, VC and CCS. In exceptions) similar performance especially for large K. Still, some
such cases, compactness also induces smoother and more regular algorithms perform consistently better than others, as for example
superpixels. We also observe that many algorithms exhibiting ex- ERGC, SLIC, ERS and CRS. For UE, low performing algorithms,
cellent boundary adherence such as CRS, SEEDS or ETPS generate such as PF or QS, are still easily identified while the remaining
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 9
Fig. 5. Qualitative results on the BSDS500, SBD and Fash datasets. Excerpts from the images in Fig. 1 are shown for K ≈ 400 in the upper left corner and K ≈ 1200 in
the lower right corner. Superpixel boundaries are depicted in black. We judge visual quality on the basis of boundary adherence, compactness, smoothness and regularity.
Boundary adherence can be judged both on the caterpillar image as well as on the woman image – the caterpillar’s boundaries are hard to detect and the woman’s face
exhibits small details. In contrast, compactness, regularity and smoothness can be evaluated considering the background in the caterpillar and see images.
10 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
Fig. 6. Qualitative results on the NYUV2 and SUNRGBD datasets. Excerpts from the images in Fig. 1 are shown for K ≈ 400 in the upper left corner and K ≈ 1200 in
the lower right corner. Superpixel boundaries are depicted in black. NC, RW and SEAW could not be evaluated on the SUNRGBD dataset due to exhaustive memory usage
of the corresponding MatLab implementations. Therefore, results for the NYUV2 dataset are shown. Visual quality is judged regarding boundary adherence, compactness,
smoothness and regularity. We also find that depth information, as used in DASP and VCCS, may help resemble the underlying 3D-structure.
Fig. 7. The influence of a low, on the left, and high, on the right, compactness parameter demonstrated on the caterpillar image from the BSDS500 datasets using SLIC and
CRS for K ≈ 400. Superpixel boundaries are depicted in black. Superpixel algorithms providing a compactness parameter allow to trade boundary adherence for compactness.
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 11
BSDS500 NYUV2 Overall, we find that min Rec , max UE and min EV appropriately
reflect the stability of superpixel algorithms.
0.6 0.6 The minimum/maximum of Rec, UE and EV captures
lower/upper bounds on performance. In contrast, the correspond-
0.4 0.4
ing standard deviation can be thought of as the expected deviation
from the average performance. We consider Fig. 9(g)–(i) showing
CO
CO
the standard deviation of Rec, UE and EV on the BSDS500 dataset.
0.2 0.2 We can observe that in many cases good performing algorithms
such as ETPS, CRS, SLIC or ERS also demonstrate low standard
0 0 deviation. Oversegmentation algorithms, on the other hand, show
1,000 1,000 higher standard deviation – together with algorithms such as PF,
log K log K TPS, VC, CIS and SEAW. In this sense, stable algorithms can also be
identified by low and monotonically decreasing standard deviation.
W EAMS NC FH The variation in the number of generated superpixels is an
RW QS PF TP important aspect for many superpixel algorithms. In particular,
CIS SLIC CRS ERS high standard deviation in the number of generated superpixels
PB SEEDS TPS VC
can be related to poor performance regarding Rec, UE and EV. We
CCS CW ERGC MSS
find that superpixel algorithms ensuring that the desired number
preSLIC WP ETPS LSC
of superpixels is met within appropriate bounds are preferrable.
POISE SEAW
We consider Fig. 9(j) and (k), showing max K and std K for K ≈
Fig. 8. CO on the BSDS500 and NYUV2 datasets. Considering Figs. 5 and 6, CO ap- 400 on the BSDS500 dataset. Even after enforcing connectivity as
propriately reflects compactness. However, it does not take into account other as- described in Section 6.2, we observe that several implementations
pects of visual quality such as regularity and smoothness. Therefore, we find that are not always able to meet the desired number of superpixels
CO is of limited use in a quantitative assessment of visual quality.
within acceptable bounds. Among these algorithms are QS, VC,
FH, CIS and LSC. Except for the latter case, this can be related
to poor performance with respect to Rec, UE and EV. Conversely,
algorithms tend to lie more closely together. Nevertheless, we can considering algorithms such as ETPS, ERGC or ERS which guar-
identify algorithms consistently demonstrating good performance, antee that the desired number of superpixels is met exactly, this
such as ERGC, ETPS, CRS, SLIC and ERS. On the NYUV2 dataset, can be related to good performance regarding these metrics. To
considering Fig. 10(a) and (b), these observations can be con- draw similar conclusions for algorithms utilizing depth informa-
firmed except for minor differences as for example the excellent tion, i.e.DASP and VCCS, the reader is encouraged to consider
performance of ERS regarding UE or the better performance of Fig. 10(j) and (k), showing max K and std K for K ≈ 400 on the
QS regarding UE. Overall, Rec and UE provide a quick overview of NYUV2 dataset. We can conclude that superpixel algorithms with
superpixel algorithm performance but might not be sufficient to low standard deviation in the number of generated superpixels
reliably discriminate superpixel algorithms. are showing better performance in many cases.
In contrast to Rec and UE, EV offers a ground truth independent Finally, we discuss the proposed metrics AMR, AUE and
assessment of superpixel algorithms. Considering Fig. 9(c), show- AUV (computed as the area below the MR = (1 − Rec ), UE and
ing EV on the BSDS500 dataset, we observe that algorithms are UV = (1 − EV ) curves within the interval [K min , K max ] =
dragged apart and even for large K significantly different EV val- [20 0, 520 0], i.e.lower is better). We find that these metrics ap-
ues are attained. This suggests, that considering ground truth propriately reflect and summarize the performance of superpixel
independent metrics may be beneficial for comparison. However, algorithms independent of K. As can be seen in Fig. 11(a), showing
EV cannot replace Rec or UE, as we can observe when comparing , and on the BSDS500 dataset, most
to Fig. 9(a) and (b), showing Rec and UE on the BSDS500 dataset; of the previous observations can be confirmed. For example, we
in particular QS, FH and CIS are performing significantly better exemplarily consider SEEDS and observe low AMR and AUV which
with respect to EV than regarding Rec and UE. This suggests that is confirmed by Fig. 9(a) and (c), showing Rec and EV on the
EV may be used to identify poorly performing algorithms, such BSDS500 dataset, where SEEDS consistently outperforms all al-
as TPS, PF, PB or NC. On the other hand, EV is not necessarily gorithms except for ETPS. However, we can also observe higher
suited to identify well-performing algorithms due to the lack of AUE compared to algorithms such as ETPS, ERS or CRS wich is
underlying ground truth. Overall, EV is suitable to complement the also consistent with Fig. 9(b), showing UE on the BSDS500 dataset.
view provided by Rec and UE, however, should not be considered We conclude, that AMR, AUE and AUV give an easy-to-understand
in isolation. summary of algorithm performance. Furthermore, AMR, AUE and
The stability of superpixel algorithms can be quantified by AUV can be used to rank the different algorithms according to
min Rec , max UE and min EV considering the behavior for in- the corresponding metrics; we will follow up on this idea in
creasing K. We consider Fig. 9(d)–(f), showing min Rec , max UE Section 7.7.
and min EV on the BSDS500 dataset. We define the stability of The observed AMR, AUE and AUV also properly reflect the
superpixel algorithms as follows: an algorithm is considered stable difficulty of the different datasets. We consider Fig. 11 showing
if performance monotonically increases with K (i.e.monotonically , and for all five datasets. Concen-
increasing Rec and EV and monotonically decreasing UE). Further- trating on SEEDS and ETPS, we see that the relative performance
more, these experiments can be interpreted as empirical bounds (i.e.the performance of SEEDS compared to ETPS) is consistent
on the performance. For example algorithms such as ETPS, ERGC, across datasets; SEEDS usually showing higher AUE while AMR and
ERS, CRS and SLIC can be considered stable and provide good AUV are usually similar. Therefore, we observe that these metrics
bounds. In contrast, algorithms such as EAMS, FH, VC or POISE are can be used to characterize the difficulty and ground truth of the
punished by considering min Rec , max UE and min EV and cannot datasets. For example, considering the Fash dataset, we observe
be described as stable. Especially oversegmentation algorithms very high AUV compared to the other datasets, while AMR and
show poor stability. Most strikingly, EAMS seems to perform AUE are usually very low. This can be explained by the ground
especially poorly on at least one image from the BSDS500 dataset.
12 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
BSDS500
1 0.975
0.2
0.9
0.8
0.15
Rec
UE
EV
0.8
0.1
0.6
0.05
0.5 0.7
500 1,000 3,000 6,000 500 1,000 3,000 6,000 500 1,000 3,000 6,000
log K log K log K
(a) (b) (c)
1 0.6 0.9
0.8
0.8
0.4
min Rec
max UE
min EV
0.6
0.6
0.4
0.2
0.4
0.08 0.08
0.06 0.1
std Rec
std UE
std EV
0.06
0.04
0.04
0.05
0.02
0.02
0 0.02
500 1,000 3,000 6,000 500 1,000 3,000 6,000 500 1,000 3,000 6,000
K K K
(g) (h) (i)
·103
12 600
10
std K
400
8
max K
200
6
0
4
W
EAMS
RW
NC
QS
PF
SEEDS
MSS
SEAW
FH
TP
CIS
SLIC
PB
preSLIC
WP
POISE
CRS
ERS
TPS
VC
CCS
CW
ERGC
ETPS
LSC
W EAMS NC FH RW QS PF TP CIS
SLIC CRS ERS PB SEEDS TPS VC CCS CW
ERGC MSS preSLIC WP ETPS LSC POISE SEAW
Fig. 9. Quantitative experiments on the BSDS500 dataset; remember that K denotes the number of generated superpixels. Rec (higher is better) and UE (lower is better)
give a concise overview of the performance with respect to ground truth. In contrast, EV (higher is better) gives a ground truth independent view on performance. While
top-performers as well as poorly performing algorithms are easily identified, we provide more find-grained experimental results by considering min Rec , max UE and min EV .
These statistics additionally can be used to quantity the stability of superpixel algorithms. In particular, stable algorithms are expected to exhibit monotonically improving
min Rec , max UE and min EV . The corresponding std Rec , std UE and std EV as well as max K and std K help to identify stable algorithms.
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 13
NYUV2
1 0.2 1
0.9 0.95
0.15
Rec
UE
EV
0.8 0.9
0.1
0.7 0.85
1 0.6 1
0.8 0.8
0.4
min Rec
max UE
min EV
0.6
0.6
0.4 0.2
0.4
0.08 0.08
0.06
0.06
std Rec
std UE
std EV
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0 0.01 0
500 1,000 3,000 6,000 500 1,000 3,000 6,000 500 1,000 3,000 6,000
K K K
(g) (h) (i)
·103
12 200
10 150
std K
8 100
max K
50
6
0
4
SEAW
W
EAMS
NC
FH
RW
QS
PF
TP
CIS
SEEDS
MSS
preSLIC
WP
POISE
SLIC
CRS
ERS
PB
DASP
TPS
VC
CCS
VCCS
CW
ERGC
ETPS
LSC
W EAMS NC FH RW QS PF TP CIS
SLIC CRS ERS PB DASP SEEDS TPS VC CCS
VCCS CW ERGC MSS preSLIC WP ETPS LSC POISE
SEAW
Fig. 10. Quantitative results on the NYUV2 dataset; remember that K denotes the number of generated superpixels. The presented experimental results complement the
discussion in Fig. 9 and show that most observations can be confirmed across datasets. Furthermore, DASP and VCCS show inferior performance suggesting that depth
information does not necessarily improve performance.
14
0
5
10
15
20
0
10
20
30
0
10
20
0
10
20
30
0
10
20
30
W W W W W
AUE
EAMS
AUV
AUE
AUE
AUE
AMR
AUV
AUV
AUV
AUE
AMR
AMR
AMR
AUV
AMR
NC (NC) failed (NC) failed NC NC
FH FH FH FH FH
RW (RW) failed RW RW RW
QS QS QS QS QS
PF PF PF PF PF
TP TP TP TP TP
PB PB PB PB PB
(e) Fash
(c) SBD
(b) NYUV2
TPS
(a) BSDS500
(d) SUNRGBD
VC VC VC VC VC
WP WP WP WP WP
generated superpixels. Plausible examples to consider are top-performing algorithms such as ETPS, ERS, SLIC or CRS as well as poorly performing ones such as QS and PF.
Fig. 11. AMR, AUE and AUV (lower is better) on the used datasets. We find that AMR, AUE and AUV appropriately summarize performance independent of the number of
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 15
truth shown in Fig. B.17(e), i.e.the ground truth is limited to the BSDS500 NYUV2
foreground (in the case of Fig. B.17(e), the woman), leaving even
complicated background unannotated. Similar arguments can be 100
100
developed for the consistently lower AMR, AUE and AUV for the 10
NYUV2 and SUNRGBD datasets compared to the BSDS500 dataset. 10
For the SBD dataset, lower AMR, AUE and AUV can also be 1
log t
log t
1
explained by the smaller average image size. 0.1
In conclusion, AMR, AUE and AUV accurately reflect the perfor- 0.1
mance of superpixel algorithms and can be used to judge datasets. 0.01
0.01
Across the different datasets, path-based and density-based
algorithms perform poorly, while the remaining classes show 4001,200 3,600 4001,200 3,600
mixed performance. However, some iterative energy optimization, K K
clustering-based and graph-based algorithms such as ETPS, SEEDS,
Fig. 12. Runtime in seconds on the BSDS500 and NYUV2 datasets. Watershed-
CRS, ERS and SLIC show favorable performance. based, some clustering-based algorithms as well as PF offer runtimes below 100ms.
In the light of realtime applications, CW, W and PF even provide runtimes below
10ms. However, independent of the application at hand, we find runtimes below 1s
7.2.1. Depth
beneficial for using superpixel algorithms as pre-processing tools.
Depth information does not necessarily improve performance
regarding Rec, UE and EV. We consider Fig. 10(a)–(c) presenting
Rec, UE and EV on the NYUV2 dataset. In particular, we consider BSDS500
DASP and VCCS. We observe, that DASP consistently outperforms 1
VCCS. Therefore, we consider the performance of DASP and 0.2
0.8 1
investigate whether depth information improves performance.
log t
Rec
UE
Note that DASP performs similar to SLIC, exhibiting slightly 0.1
0.6 0.1
worse Rec and slightly better UE and EV for large K. However,
DASP does not clearly outperform SLIC. As indicated in Section 3, 0.01
0.4 0
DASP and SLIC are both clustering-based algorithms. In particular, 135 10 25 135 10 25 135 10 25
both algorithms are based on k-means using color and spatial iterations iterations iterations
information and DASP additionally utilizes depth information.
This suggests that the clustering approach does not benefit from W EAMS NC FH
depth information. We note that a similar line of thought can reFH RW QS PF
be applied to VCCS except that VCCS directly operates within a TP CIS SLIC vlSLIC
point cloud, rendering the comparison problematic. Still we con- CRS ERS PB DASP
SEEDS reSEEDS TPS VC
clude that depth information used in the form of DASP does not
CCS VCCS CW ERGC
improve performance. This might be in contrast to experiments
MSS preSLIC WP ETPS
with different superpixel algorithms, e.g.a SLIC variant using depth
LSC POISE SEAW
information as in Zhang et al. (2013). We suspect that regarding
the used metrics, the number of superpixels (K = 200) and the Fig. 13. Rec, UE and runtime in seconds t for iterative algorithms with K ≈ 400
used superpixel algorithm, the effect of depth information might on the BSDS500 dataset. Some algorithms allow to gradually trade performance for
runtime, reducing runtime by several 100ms in some cases.
be more pronounced in the experiments presented in Zhang et al.
(2013) compared to ours. Furthermore, it should be noted that our
evaluation is carried out in the 2D image plane, which does not
without explicit runtime requirements, we find runtimes below
directly reflect the segmentation of point clouds.
1s per image to be beneficial for using superpixel algorithms as
pre-processing tool, ruling out TPS, CIS, SEAW, RW and NC. Over-
7.3. Runtime all, several superpixel algorithms provide runtimes appropriate for
pre-processing tools; realtime applications are still limited to a
Considering runtime, we report CPU time3 excluding connected few fast algorithms.
components but including color space conversions if applicable. Iterative algorithms offer to reduce runtime while gradu-
We made sure that no multi-threading or GPU computation were ally lowering performance. Considering Fig. 13, showing Rec,
used. We begin by considering runtime in general, with a glimpse UE and runtime in seconds t for all iterative algorithms on the
on realtime applications, before considering iterative algorithms. BSDS500 dataset, we observe that the number of iterations can
We find that some well performing algorithms can be run at safely be reduced to decrease runtime while lowering Rec and in-
(near) realtime. We consider Fig. 12 showing runtime in seconds creasing UE only slightly. In the best case, for example considering
t on the BSDS500 (image size 481 × 321) and NYUV2 (image size ETPS, reducing the number of iterations from 25 to 1 reduces the
608 × 448) datasets. Concretely, considering the watershed-based runtime from 680ms to 58ms, while keeping Rec und UE nearly
algorithms W and CW, we can report runtimes below 10ms on constant. For other cases, such as SEEDS 4 , Rec decreases abruptly
both datasets, corresponding to roughly 100fps. Similarly, PF runs when using less than 5 iterations. Still, runtime can be reduced
at below 10ms. Furthermore, several algorithms, such as SLIC, from 920ms to 220ms. For CRS and CIS, runtime reduction is sim-
ERGC, FH, PB, MSS and preSLIC provide runtimes below 80ms and ilarly significant, but both algorithms still exhibit higher runtimes.
some of them are iterative, i.e.reducing the number of iterations If post-processing is necessary, for example for SLIC and preSLIC,
may further reduce runtime. However, using the convention that the number of iterations has to be fixed in advance. However,
realtime corresponds to roughly 30fps, this leaves preSLIC and for other iterative algorithms, the number of iterations may be
MSS on the larger images of the NYUV2 dataset. However, even
4
We note that we use the original SEEDS implementation instead of the OpenCV
3
Runtimes have been taken on an Intel® CoreTM i7-3770 @ 3.4GHz, 64bit with Bradski (20 0 0) implementation which is reported to be more efficient, see http:
32GB RAM. //docs.opencv.org/3.0- last- rst/modules/ximgproc/doc/superpixels.html.
16 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
BSDS500 ·103
1 0.3 10
1
8
0.9 0.2 0.8 0.2
6
Rec
UE
K
0.8 0.1
log t
Rec
0.15 0.6
UE
4
0.7 0.1
0.1 2
0.4
0.6
0 0
0.5 0.05 0.01 0 0.08 0.16 0 0.08 0.16 0 0.08 0.16
1,000 4,000
500 1,000 4,000
500 500
1,000 4,000
log K log K K Fig. 15. The influence of salt and pepper noise for p ∈ {0, 0.04, 0.08, 0.12, 0.16} be-
FH reFH SLIC vlSLIC ing the probability of salt or pepper. Regarding Rec and UE, most algorithms are not
significantly influence by salt and pepper noise. Algorithms such as QS and VC com-
SEEDS reSEEDS preSLIC
pensate the noise by generating additional superpixels.
Fig. 14. Rec, UE and runtime in seconds t on the BSDS500 dataset for different im-
plementations of SLIC, SEEDS and FH. In particular, reSEEDS and reFH show slightly
better performance which may be explained by improved connectivity. vlSLIC, in BSDS500
contrast, shows similar performance to SLIC and, indeed, the implementations are ·103
1 0.3 3
very similar. Finally, preSLIC reduces runtime by reducing the number of iterations
spend on individual superpixels.
0.8 0.2 2
Rec
UE
K
0.6
0.1 1
adapted at runtime depending on the available time. Overall, 0.4
iterative algorithms are beneficial as they are able to gradually 0 0
0 5 9 13 17 0 5 9 13 17 0 5 9 13 17
trade performance for runtime.
We conclude that watershed-based as well as some path-based W EAMS NC FH
and clustering-based algorithms are candidates for realtime appli- RW QS PF TP
cations. Iterative algorithms, in particular many clustering-based CIS SLIC CRS ERS
and iterative energy optimization algorithms, may further be PB SEEDS TPS VC
speeded up by reducing the number of iterations and trading CCS CW ERGC MSS
performance for runtime. On a final note, we want to remind the preSLIC WP ETPS LSC
reader that the image sizes of all used datasets may be relatively POISE SEAW
small compared to today’s applications. However, the relative
Fig. 16. The influence of average blur for k ∈ {0, 5, 9, 13, 17} being the filter size. As
runtime comparison is still valuable.
can be seen, blurring gradually reduces performance – which may be explained by
vanishing image boundaries. In addition, for algorithms such as VC and QS, blurring
7.4. Influence of implementations also leads to fewer superpixels being generated.
Table 3
Average AMR and AUE, average ranks and rank distribution for each evaluated algorithm. To compute average ranks, the algorithms were ranked according to AMR + AUE
(where lowest AMR + AUE corresponds to the best rank, i.e.1) on each dataset separately. For all algorithms (rows), the rank distribution (columns 1 through 28) illustrates
the frequency a particular rank was attained over all considered datasets. We note that we could not evaluate RW, NC and SEAW on the SUNRGBD dataset and DASP and
VCCS cannot be evaluated on the BSDS500, SBD and Fash datasets.
8. Conclusion
Table A.4
List of evaluated superpixel algorithms. First of all we present the used acronym (in parts consistent with Achanta et al., 2012 and Neubert
and Protzel, 2012), the corresponding publication, the year of publication and the number of Google Scholar citations as of October 13,
2016. We present a coarse categorization which is discussed in Section 3. We additionally present implementation details such as the
programming language, supported color spaces and provided parameters. The color space used for evaluation is marked by a square.
(see the ranking in Section 7.7) and low standard deviation in Rec, C2. Average Miss Rate, Average Undersegmentation Error and Average
UE and EV. We observe a correlation of −0.47 between Rec and Unexplained Variation
UE reflecting that SEEDS exhibits high Rec but comparably lower
UE on the BSDS500 dataset. This justifies the choice of using both As introduced in Section 5.2, AMR, AUE and AUV are
Rec and UE for quantitative comparison. The correlation of UE and intended to summarize algorithm performance indepen-
ASA is 1, which we explained with the respective definitions. More dent of K. To this end, we compute the area below the
interestingly, the correlation of UE and UELevin is −0.7. Therefore, MR = (1 − Rec ), UE and UV = (1 − EV ) curves within the
we decided to discuss results regarding UELevin in more detail in interval [K min , K max ] = [20 0, 520 0]. As introduced before, the
Appendix E. Nevertheless, this may also confirm the observations first corresponds to the Boundary Miss Rate (MR) and the last is
by Neubert and Protzel (2012) as well as Achanta et al. (2012) that referred to as Unexplained Variation (UV). In particular, we use the
UELevin unjustly penalizes large superpixels. The high correlation trapezoidal rule for integration. As the algorithms do not neces-
of −0.97 between MDE and Rec has also been explained using sarily generate the desired number of superpixels, we additionally
the respective definitions. Interestingly, the correlation decreases considered the following two cases for special treatment. First, if
with increased K . This can, however, be explained by the imple- an algorithm generates more that K max superpixels (or less than
mentation of Rec, allowing a fixed tolerance of 0.0025 times the K min ), we interpolate linearly to determine the value for K max
image diagonal. The correlation of −0.42 between ICV and EV was (K min ). Second, if an algorithm consistently generates less that
explained by the missing normalization of ICV compared to EV. K max (or more than K min ) superpixels, we take the value lower
This observation is confirmed by the decreasing correlation for or equal (greater or equal) and closest to K max (K min ). In the
larger K as, on average, superpixels get smaller thereby reducing second case, a superpixel algorithm is penalized if it is not able to
the influence of normalization. generate very few (i.e.K min ) or very many (i.e.K max ) superpixels.
20 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
Table C.5
Pearson correlation coefficient of all discussed metrics exemplarily shown for SEEDS with
K ≈ 400, K ≈ 1200 and K ≈ 3600.
Appendix D. Parameter optimization however, this is not the case. We standardized initialization in
both cases.
We discuss the following two topics concerning parameter
optimization in more detail: color spaces and controlling the Appendix E. Experiments
number of superpixels in a consistent manner. Overall, we find
that together with Section 6, the described parameter optimization We complement Section 7 with additional experimental re-
procedure ensures fair comparison as far as possible. sults. In particular, we provide additional qualitative results to
better judge the visual quality of individual superpixel algo-
D1. Color spaces rithms. Furthermore, we explicitly present ASA and UELevin on the
BSDS500 and NYUV2 datasets as well as Rec, UE and EV on the
The used color space inherently influences the performance SBD, SUNRGBD and Fash datasets.
of superpixel algorithms as the majority of superpixel algorithms
depend on comparing pixels within this color space. To ensure E1. Qualitative
fair comparison, we included the color space in parameter op-
timization. In particular, we ensured that all algorithms support We briefly discuss visual quality on additional examples pro-
RGB color space and considered different color spaces only if vided in Figs. E.18 and E.19. Additionally, Fig. E.20 shows the
reported in the corresponding publications or supported by the influence of the compactness parameter on superpixel algorithms
respective implementation. While some algorithms may benefit not discussed in Section 7.1.
from different color spaces not mentioned in the corresponding Most algorithms exhibit good boundary adherence, especially
publications, we decided to not consider additional color spaces for large K. In contrast to the discussion in Section 7.1 focussing
for simplicity and to avoid additional overhead during parameter on qualitative results with K ≈ 400 and K ≈ 1200, Figs. E.18 and
optimization. Parameter optimization yielded the color spaces E.19 also show results for K ≈ 3600. We observe that with rising
highlighted in Table A.4. K, most algorithms exhibit better boundary adherence. Exceptions
are, again, easily identified: FH, QS, CIS, PF, PB, TPS and SEAW.
D2. Controlling the number of generated superpixels Still, due to higher K, the effect of missed image boundaries
is not as serious as with less superpixels. Overall, the remain-
Some implementations, for example ERS and POISE, control ing algorithms show good boundary adherence, especially for
the number of superpixels directly – for example by stopping the high K.
merging of pixels as soon as the desired number of superpixels Compactness increases with higher K; still, a compactness
is met. In contrast, clustering-based algorithms (except for DASP), parameter is beneficial. While for higher K, superpixels tend to be
contour evolution algorithms, watershed-based algorithms as well more compact in general, the influence of parameter optimization
as path-based algorithms utilize a regular grid to initialize super- with respect to Rec and UE is still visible – also for algorithms pro-
pixels. Some algorithms allow to adapt the grid in both horizontal viding a compactness parameter. For example, ERGC or ETPS ex-
and vertical direction, while others require a Cartesian grid. We hibit more irregular superpixels compared to SLIC or CCS. Comple-
expected this difference to be reflected in the experimental results, menting this discussion, Fig. E.20 shows the influence of the com-
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 21
Fig. E.18. Qualitative results on the BSDS500, SBD and Fash datasets; excerpts from the images in Fig. 1 are shown for K ≈ 1200, in the upper left corner, and K ≈ 3600,
in the lower right corner. Superpixel boundaries are depicted in black; best viewed in color. We observe that with higher K both boundary adherence and compactness
increases, even for algorithms not offering a compactness parameter.
22 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
Fig. E.19. Qualitative results on the NYUV2 and SUNRGBD datasets; excerpts from the images in Fig. 1 are shown for K ≈ 1200, in the upper left corner, and K ≈ 3600, in
the lower right corner. Superpixel boundaries are depicted in black. NC, RW and SEAW could not be evaluated on the SUNRGBD dataset due to exhaustive memory usage of
the corresponding MatLab implementations. Therefore, results on the NYUV2 dataset are shown.
pactness parameter for the algorithms with compactness parame- E2. Quantitative
ter not discussed in detail in Section 7.1. It can be seen, that a com-
pactness parameter allows to gradually trade boundary adherence The following experiments complement the discussion in
for compactness in all of the presented cases. However, higher Section 7.2 in two regards. First, we present additional experi-
K also induces higher compactness for algorithms not providing a ments considering both ASA and UELevin on the BSDS500 and
compactness parameter such as CIS, RW, W or MSS to name only NYUV2 datasets. Then, we consider Rec, UE and EV in more
a few examples. Overall, compactness benefits from higher K. details for the remaining datasets, i.e.the SBD, SUNRGBD and
Overall, higher K induces both better boundary adherence Fash datasets. We begin by discussing ASA and UELevin , also in
and higher compactness independent of whether a compactness regard to the observations made in Sections 5.1 and Appendix C.
parameter is involved.
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 23
E3. Runtime
SBD
1 0.2 1
0.9 0.15
0.9
Rec
UE
EV
0.8
0.1
0.8
0.7
0.05
1,000 1,000 1,000
SUNRGBD
1 1
0.15
0.9 0.95
Rec
UE
EV
0.8 0.9
0.1
0.7 0.85
0.05
0.6 0.8
1,000 1,000 1,000
Fash
1 0.1 1
0.08
0.95
0.9
Rec
UE
EV
0.06
0.9
0.04
0.8
0.85 0.02
1,000 1,000 1,000
W EAMS NC FH RW QS PF TP CIS
SLIC CRS ERS PB DASP SEEDS TPS VC CCS
VCCS CW ERGC MSS preSLIC WP ETPS LSC POISE
SEAW
Fig. E.21. Rec, UE and EV on the SBD, SUNRGBD and Fash datasets. Similar to the results presented for the BSDS500 and NYUV2 datasets (compare Figs. 9 and 10), Rec and
UE give a roguh overview of algorithm performance with respect to ground truth. Concerning Rec, we observe similar performance across the three datasets, while algo-
rithms may show different behavior with respect to UE. Similarly, EV gives a ground truth independent overview of algorithm performance where algorithms show similar
performance across datasets.
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 25
1
1 1
0.1
t
t
0.1 0.1
0.01
0.01
0.01
0.001
1,000 2,000 3,000 4,000 1,000 2,000 3,000 4,000 1,000 2,000 3,000 4,000
K K K
Fig. E.22. Runtime in seconds t on the SBD, SUNRGBD and Fash datasets. The results allow to get an impression of how runtime of individual algorithms scales with the size
of the image. In particular, we deduce that most algorithm’s runtime scales linear in the input size, while the number of generated superpixels does have little influence.
BSDS500
150
0.96
0.2
0.94
100
UELevin
0.15
ASA
UE
0.92
0.1 50
0.9
0.05 0.88
500 1,000 3,000 6,000 500 1,000 3,000 6,000 500 1,000 3,000 6,000
NYUV2
0.2 0.98 10
0.96 8
0.15
0.94 6
UELevin
ASA
UE
0.92 4
0.1
0.9 2
0.05 0.88 0
500 1,000 3,000 6,000 500 1,000 3,000 6,000 500 1,000 3,000 6,000
log K K K
W EAMS NC FH RW QS PF TP CIS
SLIC CRS ERS PB DASP SEEDS TPS VC CCS
VCCS CW ERGC MSS preSLIC WP ETPS LSC POISE
SEAW
Fig. E.23. UE, ASA and UELevin on the BSDS500 and NYUV2 datasets. We find that ASA does not provide new insights compared to UE, as it closely reflects (1 − UE ) except
for a minor absolute offset. UELevin , in contrast, provides a different point view compared to UE. However, UELevin is harder to interpret and strongly varies across datasets.
as CW, PF, preSLIC, MSS or SLIC. Except for RW, QS and SEAW we References
also notice that the number of generated superpixels does not in-
fluence runtime significantly. Overall, the results confirm the claim Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S., 2010. SLIC Super-
pixels. Technical Report. École Polytechnique Fédérale de Lausanne.
of many authors that algorithms scale linear in the input size, Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S., 2012. SLIC superpix-
while the number of generated superpixels has little influence. els compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal.
Mach. Intell. 34 (11), 2274–2281.
26 D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27
Andres, B., Köthe, U., Helmstaedter, M., Denk, W., Hamprecht, F.A., 2008. Segmen- He, S., Lau, R.W.H., Liu, W., Huang, Z., Yang, Q., 2015. Supercnn: a superpixelwise
tation of SBFSEM volume data of neural tissue by hierarchical classification. In: convolutional neural network for salient object detection. Int. J. Comput. Vis.
DAGM Annual Pattern Recognition Symposium, pp. 142–152. 115 (3), 330–344.
Arbeláez, P., Maire, M., Fowlkes, C., Malik, J., 2011. Contour detection and hierarchi- Hoiem, D., Efros, A.A., Hebert, M., 2005. Automatic photo pop-up. ACM Trans. Graph.
cal image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33 (5), 898–916. 24 (3), 577–584.
Arbeláez, P.A., Pont-Tuset, J., Barron, J.T., Marqués, F., Malik, J., 2014. Multiscale com- Hoiem, D., Efros, A.A., Hebert, M., 2007. Recovering surface layout from an image.
binatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recog- Int. J. Comput. Vis. 75 (1), 151–172.
nition, pp. 328–335. Hoiem, D., Stein, A.N., Efros, A.A., Hebert, M., 2007. Recovering occlusion boundaries
Benesova, W., Kottman, M., 2014. Fast superpixel segmentation using morphological from a single image. In: International Conference on Computer Vision, pp. 1–8.
processing. In: Conference on Machine Vision and Machine Learning. Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T., 2011. A
van den Bergh, M., Boix, X., Roig, G., de Capitani, B., van Gool, L., 2012. SEEDS: category-level 3-d object dataset: putting the kinect to work. In: International
superpixels extracted via energy-driven sampling. In: European Conference on Conference on Computer Vision Workshops, pp. 1168–1174.
Computer Vision, vol. 7578, pp. 13–26. Kim, K.-S., Zhang, D., Kang, M.-C., Ko, S.-J., 2013. Improved simple linear iterative
van den Bergh, M., Boix, X., Roig, G., de Capitani, B., van Gool, L., 2013. clustering superpixels. In: International Symposium on Consumer Electronics,
SEEDS: superpixels extracted via energy-driven sampling. Comput. Res. Reposit. pp. 259–260.
abs/1309.3848. Koniusz, P., Mikolajczyk, K., 2009. Segmentation based interest points and evalua-
van den Bergh, M., Roig, G., Boix, X., Manen, S., van Gool, L., 2013. Online video tion of unsupervised image segmentation methods. In: British Machine Vision
seeds for temporal window objectness. In: International Conference on Com- Conference, pp. 1–11.
puter Vision, pp. 377–384. Lerma, C.D.C., Kosecka, J., 2014. Semantic segmentation with heterogeneous sen-
den Bergh, M.V., Carton, D., van Gool, L., 2013. Depth SEEDS: recovering incomplete sor coverages. In: IEEE International Conference on Robotics and Automation,
depth data using superpixels. In: Winter Conference on Applications of Com- pp. 2639–2645.
puter Vision, pp. 363–368. Levinshtein, A., Stere, A., Kutulakos, K.N., Fleet, D.J., Dickinson, S.J., Siddiqi, K., 2009.
Bódis-Szomorú, A., Riemenschneider, H., van Gool, L., 2015. Superpixel meshes for TurboPixels: fast superpixels using geometric flows. IEEE Trans. Pattern Anal.
fast edge-preserving surface reconstruction. In: IEEE Conference on Computer Mach. Intell. 31 (12), 2290–2297.
Vision and Pattern Recognition, pp. 2011–2020. Lin, D., Fidler, S., Urtasun, R., 2013. Holistic scene understanding for 3d object de-
Bradski, G., 20 0 0. The opencv library. Dr. Dobb’s Journal of Software Tools. http: tection with RGBD cameras. In: International Conference on Computer Vision,
//opencv.org/. pp. 1417–1424.
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J., 2012. A naturalistic open source Lin, T.-Y., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P.,
movie for optical flow evaluation. In: European Conference on Computer Vision, Ramanan, D., Dollár, P., Zitnick, C.L., 2014. Microsoft COCO: common objects in
pp. 611–625. context. Comput. Res. Reposit. abs/1405.0312.
Buyssens, P., Gardin, I., Ruan, S., 2014. Eikonal based region growing for superpixels Liu, F., Shen, C., Lin, G., 2014. Deep convolutional neural fields for depth estimation
generation: application to semi-supervised real time organ segmentation in CT from a single image. Comput. Res. Reposit. abs/1411.6387.
images. Innovat. Res. BioMed. Eng. 35 (1), 20–26. Liu, M., Salzmann, M., He, X., 2014. Discrete-continuous depth estimation from a
Carr, P., Hartley, R.I., 2009. Minimizing energy functions on 4-connected lat- single image. In: IEEE Conference on Computer Vision and Pattern Recognition,
tices using elimination. In: International Conference on Computer Vision, pp. 716–723.
pp. 2042–2049. Liu, S., Feng, J., Domokos, C., Xu, H., Huang, J., Hu, Z., Yan, S., 2014. Fashion parsing
Conrad, C., Mertz, M., Mester, R., 2013. Contour-relaxed superpixels. In: Energy Min- with weak color-category labels. IEEE Trans. Multimedia 16 (1), 253–265.
imization Methods in Computer Vision and Pattern Recognition, pp. 280–293. Lu, J., Yang, H., Min, D., Do, M.N., 2013. Patch match filter: efficient edge-aware fil-
Criminisi, A., 2004. Microsoft research cambridge object recognition image database. tering meets randomized search for fast correspondence field estimation. In:
http://research.microsoft.com/en-us/projects/objectclassrecognition/. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1854–1861.
Demsar, J., 2006. Statistical comparisons of classifiers over multiple data sets. J. Lucchi, A., Smith, K., Achanta, R., Knott, G., Fua, P., 2012. Supervoxel-based segmen-
Mach. Learn. Res. 7, 1–30. tation of mitochondria in EM image stacks with learned shape features. IEEE
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. ImageNet: a large-s- Trans. Med. Imaging 31 (2), 474–486.
cale hierarchical image database. In: IEEE Conference on Computer Vision and Lucchi, A., Smith, K., Achanta, R., Lepetit, V., Fua, P., 2010. A fully automated ap-
Pattern Recognition. proach to segmentation of irregularly shaped cellular structures in EM images.
Dollár, P., Wojek, C., Schiele, B., Perona, P., 2009. Pedestrian detection: a benchmark. In: International Conference on Medical Image Computing and Computer As-
In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311. sisted Interventions, pp. 463–471.
Dollár, P., Zitnick, C.L., 2013. Structured forests for fast edge detection. In: Interna- Lui, M.Y., Tuzel, O., Ramalingam, S., Chellappa, R., 2011. Entropy rate superpixel seg-
tional Conference on Computer Vision, pp. 1841–1848. mentation. In: IEEE Conference on Computer Vision and Pattern Recognition,
Dong, J., Chen, Q., Xia, W., Huang, Z., Yan, S., 2013. A deformable mixture pars- pp. 2097–2104.
ing model with parselets. In: International Conference on Computer Vision, Lv, J., 2015. An improved slic superpixels using reciprocal nearest neighbor cluster-
pp. 3408–3415. ing. Int. J. Signal Process. Image Process. Pattern Recognit. 8 (5), 239–248.
Du, Y., Shen, J., Yu, X., Wang, D., 2012. Superpixels using random walker. In: IEEE Everingham, L.M., van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A., 2007.
Global High Tech Congress on Electronics, pp. 62–63. The PASCAL Visual Object Classes Challenge (VOC2007) Results. http://www.
Ducournau, A., Rital, S., Bretto, A., Laget, B., 2010. Hypergraph coarsening for im- pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
age superpixelization. In: International Symposium on I/V Communications and Malladi, S.R.S.P., Ram, S., Rodriguez, J.J., 2014. Superpixels using morphology for rock
Mobile Network, pp. 1–4. image segmentation. In: IEEE Southwest Symposium on Image Analysis and In-
Engel, D., Spinello, L., Triebel, R., Siegwart, R., Bülthoff, H.H., Curio, C., 2009. Me- terpretation, pp. 145–148.
dial features for superpixel segmentation. In: IAPR International Conference on Marcotegui, B., Meyer, F., 1997. Bottom-up segmentation of image sequences for
Machine Vision Applications, pp. 248–252. coding. Ann. Télécommun. 52 (7), 397–407.
Felzenswalb, P.F., Huttenlocher, D.P., 2004. Efficient graph-based image segmenta- Martin, D., Fowlkes, C., Malik, J., 2004. Learning to detect natural image boundaries
tion. Int. J. Comput. Vis. 59 (2), 167–181. using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach.
Freifeld, O., Li, Y., Fisher, J.W., 2015. A fast method for inferring high-quality sim- Intell. 26 (5), 530–549.
ply-connected superpixels. In: International Conference on Image Processing, Menze, M., Geiger, A., 2015. Object scene flow for autonomous vehicles. In: IEEE
pp. 2184–2188. Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts,
Gadde, R., Jampani, V., Kiefel, M., Gehler, P.V., 2015. Superpixel convolutional net- pp. 3061–3070.
works using bilateral inceptions. Comput. Res. Reposit. abs/1511.06739. Mester, R., Conrad, C., Guevara, A., 2011. Multichannel Segmentation Using Contour
Geiger, A., Wang, C., 2015. Joint 3d object and layout inference from a single RGB-D Relaxation: Fast Super-pixels and Temporal Propagation. In: Scandinavian Con-
image. In: German Conference on Pattern Recognition, pp. 183–195. ference Image Analysis, pp. 250–261.
Gonfaus, J.M., Bosch, X.B., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzàlez, J., Mester, R., Franke, U., 1988. Statistical model based image segmentation using re-
2010. Harmony potentials for joint classification and segmentation. In: IEEE gion growing, contour relaxation and classification. In: SPIE Symposium on Vi-
Conference on Computer Vision and Pattern Recognition, pp. 3280–3287. sual Communications and Image Processing, pp. 616–624.
Gould, S., Fulton, R., Koller, D., 2009. Decomposing a scene into geometric and se- Meyer, F., 1992. Color image segmentation. In: International Conference on Image
mantically consistent regions. In: International Conference on Computer Vision, Processing and its Applications, pp. 303–306.
pp. 1–8. Moore, A.P., Prince, S.J.D., Warrell, J., Mohammed, U., Jones, G., 2008. Superpixel lat-
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D., 2008. Multi-class segmentation tices. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
with relative location prior. Int. J. Comput. Vis. 80 (3), 300–316. Morerio, P., Marcenaro, L., Regazzoni, C.S., 2014. A generative superpixel method. In:
Grundmann, M., Kwatra, V., Han, M., Essa, I.A., 2010. Efficient hierarchical International Conference on Information Fusion, pp. 1–7.
graph-based video segmentation. In: IEEE Conference on Computer Vision and Neubert, P., Protzel, P., 2012. Superpixel benchmark and comparison. Forum Bildver-
Pattern Recognition, pp. 2141–2148. arbeitung.
Gupta, S., Arbeláez, P.A., Girshick, R.B., Malik, J., 2015. Indoor scene understanding Neubert, P., Protzel, P., 2013. Evaluating superpixels in video: Metrics beyond fig-
with RGB-D images: bottom-up segmentation, object detection and semantic ure-ground segmentation. In: British Machine Vision Conference.
segmentation. Int. J. Comput. Vis. 112 (2), 133–149. Neubert, P., Protzel, P., 2014. Compact watershed and preemptive SLIC: on improving
Haas, S., Donner, R., Burner, A., Holzer, M., Langs, G., 2011. Superpixel-based inter- trade-offs of superpixel segmentation algorithms. In: International Conference
est points for effective bags of visual words medical image retrieval. In: Inter- on Pattern Recognition, pp. 996–1001.
national Conference onMedical Image Computing and Computer Assisted Inter-
ventions, pp. 58–68.
D. Stutz et al. / Computer Vision and Image Understanding 166 (2018) 1–27 27
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A., 2012. Saliency filters: contrast Vedaldi, A., Fulkerson, B., 2008. VLFeat: An Open and Portable Library of Computer
based filtering for salient region detection. In: IEEE Conference on Computer Vision Algorithms. http://www.vlfeat.org/.
Vision and Pattern Recognition, pp. 733–740. Veksler, O., Boykov, Y., Mehrani, P., 2010. Superpixels and supervoxels in an energy
Perbet, F., Maki, A., 2011. Homogeneous superpixels from random walks. In: Ma- optimization framework. In: European Conference on Computer Vision, 6315,
chine Vision and Applications, Conference on, pp. 26–30. pp. 211–224.
Rantalankila, P., Kannala, J., Rahtu, E., 2014. Generating object segmentation propos- Wang, S., Lu, H., Yang, F., Yang, M.-H., 2011. Superpixel tracking. In: International
als using global and local search. In: IEEE Conference on Computer Vision and Conference on Computer Vision, pp. 1323–1330.
Pattern Recognition, pp. 2417–2424. Weikersdorfer, D., Gossow, D., Beetz, M., 2012. Depth-adaptive superpixels. In: In-
Ren, X., Bo, L., 2012. Discriminatively trained sparse code gradients for contour de- ternational Conference on Pattern Recognition, pp. 2087–2090.
tection. In: Neural Information Processing Systems, pp. 593–601. Xiao, J., Owens, A., Torralba, A., 2013. SUN3D: a database of big spaces reconstructed
Ren, X., Malik, J., 2003. Learning a classification model for segmentation. In: Inter- using sfm and object labels. In: International Conference on Computer Vision,
national Conference on Computer Vision, pp. 10–17. pp. 1625–1632.
Ren, Z., Shakhnarovich, G., 2013. Image segmentation by cascaded region agglom- Xu, C., Corso, J.J., 2012. Evaluation of super-voxel methods for early video pro-
eration. In: IEEE Conference on Computer Vision and Pattern Recognition, cessing. In: IEEE Conference on Computer Vision and Pattern Recognition,
pp. 2011–2018. pp. 1202–1209.
Rohkohl, C., Engel, K., 2007. Efficient image segmentation using pairwise pixel sim- Yamaguchi, K., M. H, K., Ortiz, L.E., Berg, T.L., 2012. Parsing clothing in fashion
ilarities. In: DAGM Annual Pattern Recognition Symposium, pp. 254–263. photographs. In: IEEE Conference on Computer Vision and Pattern Recognition,
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T., 2008. Labelme: a database pp. 3570–3577.
and web-based tool for image annotation. Int. J. Comput. Vis. 77 (1–3), 157–173. Yamaguchi, K., McAllester, D.A., Urtasun, R., 2014. Efficient joint segmentation, oc-
Schick, A., Fischer, M., Stiefelhagen, R., 2012. Measuring and evaluating the com- clusion labeling, stereo and flow estimation. In: European Conference on Com-
pactness of superpixels. In: International Conference on Pattern Recognition, puter Vision, pp. 756–771.
pp. 930–934. Yan, J., Yu, Y., Zhu, X., Lei, Z., Li, S.Z., 2015. Object detection by labeling superpixels.
Shen, J., Du, Y., Wang, W., Li, X., 2014. Lazy random walks for superpixel segmenta- In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5107–5116.
tion. Trans. Image Process. 23 (4), 1451–1462. Yang, F., Lu, H., Yang, M.-H., 2014. Robust superpixel tracking. Trans. Image Process.
Shi, J., Malik, J., 20 0 0. Normalized cuts and image segmentation. IEEE Trans. Pattern 23 (4), 1639–1651.
Anal. Mach. Intell. 22 (8), 888–905. Yang, J., Gan, Z., Gui, X., Li, K., Hou, C., 2013. 3-d geometry enhanced superpixels for
Shu, G., Dehghan, A., Shah, M., 2013. Improving an object detector and extracting RGB-D data. In: Pacific-Rim Conference on Multimedia, pp. 35–46.
regions using superpixels. In: IEEE Conference on Computer Vision and Pattern Yao, J., Boben, M., Fidler, S., Urtasun, R., 2015. Real-time coarse-to-fine topologically
Recognition, pp. 3721–3727. preserving segmentation. In: IEEE Conference on Computer Vision and Pattern
Silberman, N., Hoiem, D., Kohli, P., Fergus, R., 2012. Indoor segmentation and sup- Recognition, pp. 2947–2955.
port inference from RGBD images. In: European Conference on Computer Vision, Zeng, G., Wang, P., Wang, J., Gan, R., Zha, H., 2011. Structure-sensitive superpix-
pp. 746–760. els via geodesic distance. In: International Conference on Computer Vision,
Siva, P., Wong, A., 2014. Grid seams: A fast superpixel algorithm for real-time appli- pp. 447–454.
cations. In: Computer and Robot Vision, Conference on, pp. 127–134. Zhang, J., Kan, C., Schwing, A.G., Urtasun, R., 2013. Estimating the 3d layout of in-
Song, S., Lichtenberg, S.P., Xiao, J., 2015. SUN RGB-D: a RGB-D scene understanding door scenes and its clutter from depth sensors. In: International Conference on
benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recogni- Computer Vision, pp. 1273–1280.
tion, pp. 567–576. Zhang, Y., Hartley, R., Mashford, J., Burn, S., 2011. Superpixels, occlusion and stereo.
Strassburg, J., Grzeszick, R., Rothacker, L., Fink, G.A., 2015. On the influence of su- In: International Conference on Digital Image Computing Techniques and Appli-
perpixel methods for image parsing. In: International Conference on Computer cations, pp. 84–91.
Vision Theory and Application, pp. 518–527.
Tighe, J., Lazebnik, S., 2010. SuperParsing: scalable nonparametric image parsing
with superpixels. In: European Conference on Computer Vision, pp. 352–365.