fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
Fig. 2. Toy example of the proposed method. The query image is marked with yellow bounding box, and relevant ones green. Given a query image, two
features are used to obtain search results. Then, for each feature, the corresponding ImageGraph is built. In the ImageGraph, each vertex points to its 3
nearest neighbors and the graph is expanded to the second layer. The edge is weighted by Bayes similarity, reflecting the retrieval quality. In ImageGraph
1, we observe that only 1 relevant image is directly connected by query, which means there are 1 true match in the top-3 ranked images of initial rank list
of Feature 1. Through these relevant images, other two true matches are connected at the second layer of the graph. In ImageGraph 2, query points to two
relevant images directly. ImageGraph 1 and ImageGraph 2 are fused by appending new nodes or re-calculating edge weights of existing nodes. Based on the
fused graph, local ranking is conducted and the images are reranked. Although there are many outliers in the graph, all the true match images are retrieved.
fusion task. Different features may produce scores diverse in ageGraph as Bayes similarity, a better discriminator between
numerical values, so the evaluation scheme should measure relevant/irrelevant images than Jaccard similarity [17], and is
the importance of images in the unified scale. Besides, a good insensitive to parameter changes. Besides, to avoid being af-
evaluation should measure features effectiveness correctly, fected by the outliers in reranking, local ranking is proposed to
assigning higher weight to relevant images under good features re-order the initial result. It aims at local optimization, and thus
and lower weight to highly-ranked outliers under bad features. is more robust to global outliers. Extensive experiments on
In light of the above analysis, this paper first proposes four image retrieval datasets confirm that the proposed method
the Rank Distance to measure the relevance of two images significantly improves baseline performance. Moreover, it is
at rank level, which is based on their ranks when each one robust to the outliers. A toy example of our fusion system is
is used as query to search for the other. Through this mea- illustrated in Fig. 2.
surement, similarity scores of different features are mapped The main contributions of this paper are summarized as
to the unified scale, thus being comparable. Besides, it is follows:
illustrated in [12, 54] that reciprocal neighborhood relationship We propose an effective measurement for robust fusion.
is a stronger indicator of similarity than unidirectional nearest Rank Distance is first introduced to measure the relevance
neighborhood relationship. Since Rank Distance considers the of images on rank level. Based on it, Bayes similarity is
reciprocal ranks of two images, i.e., the local densities of proposed to evaluate the retrieval quality of individual
vectors, it is more reliable to represent the relevance of images features, which is a good discriminator between relevan-
than similarity score. Then, to evaluate the retrieval quality t/irrelevant images and insensitive to parameter changes.
of individual features effectively, we introduce the Bayes We propose the directed ImageGraph structure to encode
similarity. It is defined as the posterior probability of two image-level relationships. ImageGraph builds on K near-
images being true match. Built on the Rank Distance, we est neighbors, and thus more candidates can be included
estimate the Bayes similarity through empirical study. in the graph, improving the recall. Besides, the edge
Our approach adopts the graph-based framework of [17]. weight of ImageGraph is measured by Bayes similarity.
Since not only top-ranked images in initial search results We propose the local ranking to rerank the initial search
but also their neighborhood are included into the graph, the result, further enhancing the robustness of our method.
similarity can be propagated through graph. Consequently, The proposed ranking algorithm aims at local optimiza-
true matched images not directly connected to query can be tion, so that it is more robust to global outliers.
retrieved. Nevertheless, the undirected graph proposed in [17] This paper is an extension of our previous conference
builds on K-reciprocal neighbors that may result in low search publication [51]. Beyond the conference paper, we propose
recall. In contrast with [17], we construct a directed graph, Rank Distance and Bayes similarity for robust evaluation,
denoted as ImageGraph. Our method uses the top-K ranked and reformulate the edge weight of ImageGraph. We also
images, so that more candidates (high recall) can be included conduct more experiments to better validate the effectiveness
in the graph. In addition, we define the edge weight of Im- of our method, and give more detailed discussions. The rest
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
ZIQIONG LIU et al.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 3
of the paper is organized as follows. After a brief review of Baluja [5] have proposed a VisualRank framework to effi-
related work in Section II, we introduce the proposed robust ciently model similarity of Google image search results with
ImageGraph in Section III. Section IV describes the datasets graph. It uses the random walk on an affinity graph, and re-
and baselines used in the experiments. Section V presents the orders images according to the visual hyperlinks. In [46], video
experimental results. Finally, conclusions are given in Section search reranking is also formulated as a random walk problem
VI. along the context graph. The edge between videos is weighted
by the linear combination of text score and visual duplicated
II. R ELATED W ORK score. Here, visual duplicated score is the similarity calculated
A. Image Search Pipeline with visual feature. To handle the errors in the initial labeled
set, a graph based semi-supervised learning [47] is applied
In image search, a myriad of methods have been proposed
to the web image search. Furthermore, a graph theoretical
in the last decade. Among them, Bag-of-Words model [23]
framework amenable to noise resistant ranking is proposed in
based on local descriptor is the most popular one. A number of
[45]. In this method, outliers can be removed from the graph
salient local regions are detected from an image with operators
by spectral filtering.
such as DoG [19] and Hessian Affine [20]. Subsequently, the
Graph-based method has also received increased attention
extracted regions are represented as high-dimensional feature
recently in content-based image search. Xie et al. [42] employ
vectors using SIFT [19] or its variants [21]. Each descriptor
the ImageWeb to discover the nature of image relationships
is quantized to its nearest visual word in the pre-trained
for refining similar image search result. A directed graph is
codebook. The codebook is obtained through unsupervised
constructed, and edge weight is computed as the count of
clustering method, e.g., approximate k-means (AKM) [22]
matched features between pairwise images. Then, the HITS
and hierarchical k-means (HKM) [18], and cluster centers are
[50] is employed to rank images using the affinity values. From
treated as visual words of the codebook. Through quantization,
the graph-based perspective, incremental query expansion and
each image is represented as a sparse histogram of visual
image-feature voting are developed in [43]. Specifically, Zhang
words. Then, fast search is achieved using inverted file [35]
et al. [17] propose a undirected graph-based query specific
and TF-IDF [23, 24, 33] weights.
fusion approach, through which multiple retrieval sets are
It is verified in many works that post-processing can further
merged. In this approach, images satisfying the reciprocal
enhance the quality of search results. A quite few works refine
neighbor relation are connected. The edge weight is measured
the initial results using spatial cues, such as [25, 27] Besides,
with the consistency of their neighborhoods, i.e., Jaccard
query expansion [41] uses highly ranked images to learn a
similarity. Then images are re-ordered through link analysis
latent feature model to expand the original query, improving
method. Based on this framework, a weakly supervised multi-
the recall. Recent study of reranking adopts image-level cues.
graph learning is proposed in [15] for enhancing the reranking
For example, K-NN reranking [8] refines the initial rank list
performance. Instead, we adopt the directed graph model, in
automatically using the K nearest neighbors. Alternatively, Qin
which an image is connected to its top-K ranked images.
et al. [12] take advantage of K-reciprocal nearest neighbors to
Specifically, the edge is weighted by Bayes similarity. Further,
identify the image set. In addition, many works conduct the
to be robust to outliers, a safe strategy is used for ranking.
reranking based on complementary cues [1517], which have
shown promising performance. Through combining the rank
lists or scores of multiple features, the recall is significantly C. Feature Fusion
improved and the system is able to find quite challenging
It is indicated that combination of multiple features obtains
occurrences of the query. To some extend, our method belongs
superior performance in image search. In [30], attribute vector
to post-processing method.
and fisher vector are combined at feature level. The fused
Besides, there are also efforts to represent the image using
feature is compressed into small codes by product quantiza-
their global properties, such as GIST [36, 37], visual attributes
tion. This method improves performance for particular object
[2, 4, 7] and deep learning features [31, 32, 34, 57, 58].
retrieval as well as categories. Another promising strategy
Such holistic features demonstrate their advantages in image
performs feature fusion on indexing level. In [28], color
search. They also serve as good complements to local ones.
signature is embedded in the inverted index to filter out false
Additionally, global features are effective to encode images
positive SIFT matches. To model correlation between features,
with relatively smaller bits, which are usually combined with
a multi-IDF scheme is introduced in [11], through which
dimensionality reduction and approximate nearest search [38
different binary features are coupled into the inverted file.
40].
Zheng et al. [14] propose a multi-dimensional inverted index,
and each dimension corresponds to one kind of feature. With
B. Graph-based Ranking the multi-index, retrieval process votes for images in both
Graph-based visual reranking has been proven effective to SIFT and other feature spaces. In addition, semantic-aware co-
refine text-based video and image search results, integrating indexing algorithm [29] leverages global semantic attributes
both initial ranking and visual consistency between images. to update inverted indexes of local features, encouraging
It constructs a graph where pairs of visually similar images semantic consensus among local similar images.
are connected by an edge. The initial rank information is For the late fusion, Zhang et al. [17] propose the graph-
propagated through the graph until convergence. Jing and based query specific fusion approach at rank level. In this
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
TABLE I
NOTATIONS AND DEFINITIONS
Notation Definition
I=(I1 , I2 , ..., IN ) I indicates the image set, and Ii indicates the i-th image.
N Total image number of dataset
R(Im , In ) Rank of In in the rank list of Im being query.
Nk (Im ) K nearest neighbors of Im .
G = (V, E, w) G indicates a graph, and V , E and w indicate the set of vertices, the set of edges,
and the corresponding edge weight, respectively.
Gs = (Vs , Es , w) Subgraph of ImageGraph G induced by the vertex set Vs V .
Es contains every edge between the vertices in Vs
d(Im , In ) Rank Distance between image Im and In .
K The breadth of ImageGraph.
P The depth of ImageGraph.
T (Im ) True match image set of Im .
F (Im ) False match image set of Im .
method, images being reciprocal K-nearest neighbors are For method i, where i = 1, 2, ..., M , its ImageGraph Gi
connected with an edge. The edge is weighted as Jaccard is constructed based on rank result ri and the pre-computed
similarity for evaluating retrieval quality of individual feature. relevance among the database images Di , which is written as:
Multiple rank lists are merged through graph, and reranking
Gi = (ri ; Di ). (2)
is achieved by PageRank or Maximizing Weighted Density.
However, the effectiveness of this method varies dramatically Specifically, G can be written as the combination of multiple
with the parameter K. Moreover, it also suffers from bad individual graphs G1 , G2 , ..., GM :
features. To be resistent to the noise, a Co-Regularized Multi-
G = (G1 , G2 , ..., GM ). (3)
Graph Learning framework [15] is proposed, incorporating
intra-graph and inter-graph constraints in a supervised way. Finally, the new rank list is calculated through ranking on
Furthermore, a simple and effective fusion method at score the ImageGraph G.
level is proposed in [16]. Through a reference codebook r = g(G). (4)
constructed off-line, features effectiveness are estimated on-
In this section, we first present Rank Distance and Bayes
the-fly in a query-adaptive manner.
similarity in Section III-A and Section III-B. Then we e-
Differently, to be robust in the fusion, we propose a rank-
laborate construction of ImageGraph in Section III-C, and
level fusion method without supervision. We adopt the frame-
introduce fusion via ImageGraph in Section III-D. Finally,
work of [17], and our work departs from the prior arts as fol-
ranking algorithm is described in Section III-E. For clarity,
lows. Firstly, in contrast with the undirected graph used in [17],
we illustrate several important notations and their definitions
we construct a directed graph, denoted as ImageGraph. The
throughout the paper in Table I.
former [17] builds on K-reciprocal neighbors that may result
in low search recall. Instead, our method uses the K nearest
neighbors, thus more candidates can be included. Secondly, A. Rank Distance
instead of Jaccard similarity [17], in our approach, edge weight Since different features may produce scores diverse in
between pairwise images is defined as Bayes similarity built numerical values, it is difficult to compare or weight the
on Rank Distance, a more effective measurement to evaluate importance of them. Moreover, initial search list usually con-
the retrieval quality. Thirdly, local ranking is performed on tains false positive images, especially when retrieval quality is
the fused ImageGraph, which aims at local optimization and bad, and thus similarity score is not reliable to represent the
improves the robustness of our method. relevance between images. To address this issue, we propose
the Rank Distance to serve as a rank-level measurement. Let
III. O UR M ETHOD I = {I1 , I2 , ..., IN } denote the image dataset, and NK (Im )
denote the K nearest neighbors of Im , where m = 1, 2, ..., N .
Before describing our approach in detail, we formulate our N is the number of dataset images. Since local densities of
problem here. Our target is to obtain a new rank according to vectors round Im and In are different, In NK (Im ) does
multiple search results, which can be defined as: not imply Im NK (In ). It is demonstrated in [12, 54] that
r = h(R; D), (1) reciprocal neighborhood, i.e., In NK (Im ) Im NK (In ),
is a much stronger indicator of two images being relevant than
where R = {r1 , r2 , ..., rM } denotes the set of rank lists unidirectional neighborhood. In this paper, we do not require
resulted from M different methods. In the offline process, we reciprocal neighbor relation. Instead, we calculate the distance
take each image in the database as query and get the search of two images based on their ranks obtained when each one
result. Then, for each image, we find its K nearest neighbors is used as query to search the other. Rank Distance can be
in the database. The pre-computed search result of database defined as below,
images is denoted as D. D represents the relevance among R(Im , In ) + R(In , Im )
the database images. d(Im , In ) = , (5)
2N
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
ZIQIONG LIU et al.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 5
Fig. 3. Examples of Rank Distance. For each query, the top-5 ranked
images under baseline Cosine distance are illustrated, where many outliers
are introduced. The numbers under the images denote their Rank Distances
(105 ) to the query. True match images are marked with green dot, while Fig. 4. Sample images in the Paris dataset for empirical study. The top and
outliers red. It is clear that Cosine distance pushes outliers in top ranks, but second row demonstrate true matches of Eiffel. We can observe that view
Rank Distance corrects this artifact by increasing the distance between outliers point and illumination vary a lot among true matches. The third and bottom
and the query. It demonstrates that Rank Distance can evaluate the similarity row show false matches of Eiffel.
effectively.
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
is sensitive to K. Therefore, we propose the ImageGraph (a) True match (b) False match
0.5 0.03
structure to encode the image-level relationships. In our ap- 0.45
proach, we take into account the K nearest neighbors and 0.4
0.025
Percentage
Percentage
0.3
G = (V, E, w). V = {v1 , v2 , ..., vN } indicates the set of 0.25 0.015
vertices, where vm is the corresponding vertex of image Im . 0.2
E is the set of edges. If In belongs to NK (Im ), there is a 0.15
0.01
0.07
threshold. Here, depth of ImageGraph means the shortest path 0.06
m
Algorithm 1. 0.03
0.02
0.01
Algorithm 1 Construction of ImageGraph
0
Off-line: 0 0.1 0.2 0.3 0.4
dn
0.5 0.6 0.7 0.8 0.9 1
E. Local Ranking
D. Fusion of Multiple ImageGraphs PageRank [49] is a query independent link analysis method,
As rank result is encoded in ImageGraph, we can fuse ranking on the whole graph. Ranking by maximizing weighted
multiple rank results efficiently via graph fusion. To this end, density [17] starts from query, and ranks a subset of graph
we combine the multiple graphs Gi = (Vi , Ei , wi ) obtained related to the specific query. It sorts the nodes by their degrees,
with different features without supervision [17]. The fused which is the sum of weights from connected edges. However,
graph is denoted as G = (V, E, w), which can be written these ranking methods suffer from outliers. Large K or bad
as: features may bring a lot of outliers into the graph. Usually,
V = i Vi , E = i Ei . (11) there are many edges linked among these irrelevant images,
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
ZIQIONG LIU et al.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 7
which is called Tightly-Knit Community Effect. In this serves as a query. The performance is measured by N-S score
situation, Ranking by maximizing weighted density [17] and (maximum 4), which is the recall of top-4 candidate images.
link analysis method [17] may lead to deviation from query Oxford The Oxford Buildings dataset consists of 5,062
and up-rank the noise. images by searching for particular Oxford landmarks from
To tackle this problem, we adopt a safe strategy to perform Flickr. This dataset has a comprehensive ground truth for 11
local-based ranking. The proposed ranking only considers different landmarks, each containing 5 possible queries. For
local optimum instead of global maximum, avoiding being each query, it has a lot of true match images, which are taken
confused by the tightly connected outliers. Since a higher from different viewpoints. Some of them have partial occlusion
edge weight reflects a higher relevance to the query, naturally, or distortion. Retrieval accuracy is measured by mean Average
we aim to find the subgraph Gs starting with q, which is Precision (mAP).
the maximum weighted. The subgraph Gs = (Vs , Es , w) is Flickr 1M The Flickr 1M dataset includes 1 millon images
induced by the vertex set Vs V . Es contains every edge arbitrarily collected from Flickr. These dataset can be added
between the vertices in Vs . We also define the candidate set into the above datasets as distractors for large scale experi-
C as vertices that Vs points to. Specifically, we initialize ments.
subgraph G0s = ({q}, , w), and C 0 contains the vertices
connected by q. At the i + 1th iteration, the vertex in C i B. Features and Baselines
which introduces the maximum weighted edges is included In this paper, we exploit four features: GIST [36], HSV
into Gi+1
s , denoted as vs
i+1
: Histogram, Convolutional Neural Network (CNN) and Bag-
X X of-Words (BoW).
vsi+1 = arg max w(vm , vn ) w(vm , vn ). GIST To compute GIST descriptor, we resize the images to
vsi+1 C i
(vm ,vn )Esi+1 (vm ,vn )Esi 256 256 following [16]. An l2 -normalized 512-dim GIST
(13) descriptor is extracted for each image using 4 scales and
This procedure continues until the number of nodes in Gs 8 orientations. Based on cosine distance, nearest neighbors
satisfies users requirement. The nodes are ranked according search is performed.
to their order of being incorporated into Gs . The algorithm of HSV For each image, we compute a 1000-dim HSV color
local ranking is illustrated in Algorithm 2. histogram using 20105 bins for H, S, V components
respectively. The l2 -normalized histogram is used for nearest
Algorithm 2 Local Ranking neighbors search with cosine distance.
1 Initialize subgraph G0s as ({q}, 0, w) and C 0 as the CNN For an input image, we extract the l2 -normalized
vertices that q points to. 4096-dim CNN descritpor from the 6-th layer in the Caffe
2 At the i + 1th iteration, vertex in C i which could intro- Network [48]. Similarly, cosine distance is defined as the
duce the maximum weighted edges is introduced into Gi+1 s
similarity function of images.Besides, we also fine-tine the
according to Eq. 13. CNN feature following [53].The re-trained feature is denoted
3 Update Gi+1 s and C i+1
. as CNN*.
4 Repeat step. 2 and step. 3 until the number of nodes in BoW For Holidays and UKBench, a 200K codebook is
Gs satisfies users requirement. trained on Flickr60K [25] dataset. 128-bit Hamming signature
5 Output Gs . The vertices are ranked according to their [25] of each SIFT descriptor is embedded in the inverted file
order of being incorporated into Gs . to filter out false matches. Hamming threshold and weighting
parameter are set to 52 and 26, respectively. For Oxford 5K,
1M codebook is trained on Paris6k dataset [26]. Moreover,
rootSIFT [21], burstiness strategy [10], multiple assignment
IV. DATASETS AND BASELINES [9] and pIDF [24] are employed on both dataset to enhance
the performance.
A. Datasets
Search results on three datasets are presented in Table II.
To evaluate the effectiveness of our approach, we conduct It shows that BoW achieves good performance, obtaining
experiments on Holidays [25], UKBench [18], Oxford [27] 80.05% in mAP, 3.583 in N-S score, and 75.31% in mAP
and Flickr 1M [25]. on Holidays, UKBench and Oxford, respectively. By contrast,
Holidays The Holidays dataset consists of 1,491 personal GIST leads to poor performance on these datasets. It yields
holiday images and 500 of them are queries. Most queries have 34.14% in mAP, 1.856 in N-S score, and 12.96% in mAP
less than 4 ground truth images undergoing various changes. on the three datasets, respectively. Moreover, HSV and CNN
The Average Precision (AP) is used to evaluate the retrieval result in moderate accuracy on Holidays and UKBench. Note
performance of each query. It is calculated as the area under that global features, i.e., HSV, GIST and CNN, do not work
the Precision-Recall curve. For all the query images, their APs well on Oxford. It is because most images in Oxford contain
are averaged, yielding mean Average Precision (mAP). mAP buildings, which are difficult to be described using global fea-
is employed to measure retrieval accuracy of the dataset. tures. After fine-tuning, the performance of CNN is improved
UKBench The UKBench dataset contains 10,200 images consistently on the three datasets. Specifically, on the Oxford,
of 2,550 objects. Each object has 4 images with different the retained feature CNN* enhances the original performance
viewpoints and illuminations. In this dataset, each image by about 10% in mAP.
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
TABLE II
P ERFORMANCE OF BASELINES ON THREE DATASETS . shown in Fig. 8. It is evident that the fusion brings consistent
benefit to various feature combinations.
Datasets GIST HSV CNN CNN* BoW On Holidays, by combining GIST, HSV, and CNN, BoW
Holidays, mAP(%) 34.14 61.21 69.22 72.34 80.05 performance is boosted to 85.51%, 87.71%, and 88.48% in
UKBench, N-S 1.856 3.195 3.397 3.502 3.582 mAP, respectively. Note that fusion of two global features also
Oxford, mAP(%) 12.96 13.29 44.56 54.14 75.31 boosts the overall performance. For HSV and CNN which have
moderate performance, their combination achieves an mAP of
79.09%. It improves the individual baseline of HSV and CNN
(a) Holidays (b) Oxford
86 82 by 17.88% and 9.87%, respectively. When bad feature GIST
is merged, the fusion still yields stable improvement. After
85 80
fused with GIST, the performance of BoW, HSV, and CNN are
84 78
increased by 5.46%, 7.01% and 1.96% in mAP, respectively.
Similar results can be observed on UKBench. The N-S score
mAP(%)
mAP(%)
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
ZIQIONG LIU et al.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 9
Fig. 8. Fusion results of two features on (a) Holidays, (b) UKBench and (c) Oxford. Six feature combinations are presented, i.e., BoW +GIST, BoW +
HSV, BoW + CNN, HSV + CNN, HSV + GIST and CNN + GIST. The green bar and blue bar represent result of the first feature and the second
feature, respectively, while yellow bar shows fusion result.
D. Evaluation of Robustness
In this section, we demonstrate the robustness of our ap-
proach to outliers. It is showed in [52] that the graph fusion
approach is robust to random noise. In the experiments of
[52], random noises are added to the rank results of features.
Specifically, the retrieved results are replaced with randomly
assigned values. In our method, the outliers refer to the natural
noise, which exists in the original rank result. Natural noise is
usually caused by the feature itself. Compared to the random
noise, natural noise is more difficult to tackle.
The outliers in ImageGraph are introduced from two ways.
On one hand, when K is larger than the number of ground
truths, a lot of outliers would be included into the graph. Thus,
we first evaluate the fusion results when K varies, which is
illustrated in Fig. 10. In order to validate our method, we
compare our results with graph fusion [17].
It is shown in Fig. 10 that graph fusion is very sensitive
to parameter K. On Holidays, its performance decreases when
Fig. 9. Comparison with graph fusion ([17]) and score fusion ([16]). Five
feature combinations are presented on (a) Holidays and (b) UKBench. The K gets large. On UKBench, N-S score first rises with K and
yellow bar represents the BoW baseline, while the blue bar, orange bar and then drops after reaching a peak at K = 4. It implies that the
gray bar show the results by graph fusion, score fusion and our method, graph fusion method achieves the best performance when K
respectively.
is about the ground truths number of the dataset. However,
when K becomes large and more outliers are introduced, the
performance drops significantly. Additionally, fusion with bad
graph fusion and our approach enhance the BoW baseline feature, i.e., B+G, leads to more rapid descend, compared
by 0.089 and 0.121 in N-S score, respectively. Good features to the combination of B+G+H and B+G+H+C.
bring further benefit in the fusion. When combined with HSV, In comparison, the performance of our method increases
BoW is increased by 0.228, 0.173, and 0.252 in N-S score with K, and then keeps stable. On Holidays, when K = 20,
using graph fusion, score fusion, and our method, respectively. our method yields the mAP of 84.69%, 88.04% and 90.18%
Similarly, fusion with CNN brings the benefit of 0.301, 0.22, using combination of B+G, B+G+H and B+G+H+C,
and 0.325 in N-S score through graph fusion, score fusion, and respectively, while graph fusion decreases to 48.96%, 67.10%
our method, respectively. When all features are fused together, and 74.58%, respectively. On UKBench, for the three com-
the three methods gain the N-S score of 3.894, 3.841, and binations, our method keeps the performance of 3.678, 3.836
3.916, respectively. and 3.904 in N-S score at K = 20, compared to 2.746, 3.328
In summary, compared to graph fusion, our method not only and 3.581 of graph fusion. It illustrates the robustness of our
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
ZIQIONG LIU et al.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 11
TABLE V
P OST- PROCESSING TIME ON H OLIDAYS + 1M D ATASET VI. C ONCLUSIONS
This paper proposes a graph-based method for robust feature
Methods Ours [17] [52] [16] [15] [12] fusion at rank level. We first define Rank Distance to measure
Time (ms) 5.36 1 1 10 2210 30 the relevance of images on rank level. Then, based on it, we
introduce the Bayes similarity to evaluate the retrieval quality
of individual features. For each feature, the ImageGraph is
constructed to model the relationship among images, in which
respectively. Each image ID costs about 21 bits. In large scale
an image is connected to its K nearest neighbors with edges,
image search, we store 4 nearest neighbors of each image,
and the edges are weighted with Bayes similarity. Multiple
then 105 bits are needed per image per feature. The memory
ranklists resulted from different methods are fused via Image-
cost of 1 million image of single feature is about 0.09GB. It
Graph. On the fused ImageGraph, images are re-ordered by
usually takes 5.2ms for ImageGraph construction and 0.16ms
local ranking, which further protects the fusion from outliers.
for ranking, which are relatively small compared to the query
Through extensive experiments on three benchmark datasets,
time. It is similar on UKBench and Oxford.
we show that significant improvement can be achieved when
Table IV shows the comparison of average query time and
multiple features are fused. Moreover, we demonstrate that
memory cost on 1 million dataset with the state-of-the-art
our method is robust to outliers, which are usually brought in
methods. Since we use four features in our experiments, the
by bad features or inappropriate parameters. Our method has
average query time is about 0.868s. Note that query time
obtained an mAP of 90.89%, N-S score of 3.920 and 84.92%
is dependent on many factors, such as the machine used
on Holidays, UKBench, and Oxford datasets, respectively, In
and number of features of dataset, thus it is not directly
the large scale experiments, we yield an mAP of 77.82% on
comparable. But it can roughly indicate the time efficiency of
Holidays + Flickr 1M. It shows that our method outperforms
the proposed approach. Moreover, our method belongs to the
two popular fusion schemes, i.e., graph fusion [17] and [16],
post-processing algorithm, which works on the given rank list.
and the results are competitive to the state-of-the-art.
We compare the time of post-processing steps of the proposed
In the future work, we will investigate how to efficiently
method with other post-processing methods considered in
update the ImageGraph structure when new images are added
Table IV. Table V shows the result of comparison. The post-
into the database or old images are deleted from it. In addtion,
processing steps of our method, i.e., ImageGraph construction
more efforts will be made to explore the feature selection
and ranking, cost 5.36ms. Most of the post-processing methods
strategies in the fusion.
in Table V cost a few milliseconds, except [15]. It is because
Acknowledgements This work was supported by the Ini-
[15] uses a supervised framework, which costs a lot of time
tiative Scientific Research Program of Ministry of Education
to build the anchors.
under Grant No. 20141081253. This work was supported
The memory cost of the proposed method is about 0.36GB. in part to Dr. Qi Tian by ARO grant W911NF-15-1-0290
The approach of [16] evaluates the retrieval quality online with and Faculty Research Gift Awards by NEC Laboratories of
score curve, rather than the neighborhood relationship. It only America and Blippar. This work was supported in part by
stores the reference book, which costs 0.076GB extra memory. National Science Foundation of China (NSFC) 61429201.
Since both [14] and [11] store binary signatures of features in
the inverted file, the memory costs of them are 6.1GB. A lot
R EFERENCES
of image-level information is stored in [12] and the cost of it
[1] Y. Wang and G. Mori. A discriminative latent model of object classes
is 22.35GB. Besides, our method adopts the same framework and attributes. In Proceedings of the IEEE European Conference on
with [15, 17, 52], thus the memory costs of these methods are Computer Vision, 2010.
similar to ours in theory. [2] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object category
recognition using classemes. In Proceedings of the IEEE European
Conference on Computer Vision, 2010.
[3] F. Yu, R. Ji, M-H Tsai, G. Y and S-F. Chang. Weak attributes for
I. Concept detection large-scale image retrieval. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2012.
To further validate our method, we perform concept de- [4] D. Parikh and K. Grauman. Relative attributes. In Proceedings of the
tection experiments on Flickr25000 dataset [56].We randomly IEEE International Conference on Computer Vision, 2011.
select 2000 images from dataset as queries. The rest images [5] F. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale
image search. IEEE Transaction on Pattern Analysis and Machine
are seen as the database images. For each query, we calculate Intelligence, vol.30, no.7, pp.1877-1890, 2008.
its distance to each concept class, using image-to-category dis- [6] J. Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis through itself
tance [55]. After obtaining the rank lists of different features, tuning for web image search. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2009.
we fuse them with our proposed method. Here we still use [7] A. Kovashka, D. Parikh, and K. Grauman. WhittleSearch: Image search
mAP to measure the performance. BoW, CNN, HSV and GIST with relative attribute feedback. In Proceedings of the IEEE Conference
obtain the performance of 32.1%, 42.9%, 14.6%, 9.4% in on Computer Vision and Pattern Recognition, 2012.
[8] X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu. Object retrieval and
mAP, respectively. We can see that the CNN feature achieves localization with spatially-constrained similarity measure and k-nn re-
the best performance for concept detection task. Fused with ranking. In Proceedings of the IEEE Conference on Computer Vision
BoW, HSV and GIST, the CNN result is improved to 49.2%, and Pattern Recognition, 2012.
[9] H. Jegou, M. Douze, and C. Schmid. Improving bag-of-features for large
42.6% and 42.8% in mAP, respectively. The fusion of four scale image search. International Journal of Computer Vision, vol.87,
features obtains the mAP of 50.5%. no.3, pp.316-336, 2008.
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
Fig. 12. Examples of retrieval results from Holidays (top), UKBench (middle) and Oxford (bottom) datasets, respectively. For each query, its top-10 ranked
images resulted from GIST (the first row), HSV (the second row), CNN (the third row), BoW (the fourth row) and ImageGraph feature fusion (the fifth row)
are shown, respectively. True matched images are marked with green dot, and false matched ones red.
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
ZIQIONG LIU et al.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 13
TABLE III
F USION RESULTS OF DIFFERENT FEATURE COMBINATIONS ON BENCHMARKS .
Feature Combinations Holidays, mAP (%) UKBench, N-S Oxford, mAP (%) Holidays+Flickr1M, mAP (%)
BoW + GIST 85.51 3.703 79.08 69.91
BoW + HSV 87.71 3.843 80.06 75.25
BoW + CNN 88.48 3.907 81.27 76.17
BoW + CNN* 89.31 3.913 84.80 77.08
BoW + GIST + HSV 88.52 3.855 79.97 75.26
BoW + GIST + CNN 88.76 3.905 81.96 76.24
BoW + GIST + CNN* 89.40 3.914 84.80 77.09
BoW + HSV + CNN 90.02 3.916 82.01 77.21
BoW + HSV + CNN* 90.89 3.920 84.91 77.82
BoW + GIST + HSV + CNN 90.28 3.916 82.05 77.22
BoW + GIST + HSV + CNN* 90.89 3.920 84.92 77.82
TABLE IV
P ERFORMANCE COMPARISON WITH THE STATE - OF - THE - ART.
Methods Ours [17] [52] [16] [15] [14] [13] [11] [12] [10] [9]
Holidays, mAP(%) 90.89 84.64 84.64 87.98 84.7 85.8 80.1 85.2 - 84.8 84.8
UKBench, N-S 3.920 3.77 3.83 3.841 3.75 3.85 - 3.79 3.67 3.64 3.55
Oxford, mAP(%) 84.92 - - - 84.3 - 85.0 - 81.4 68.5 74.7
Holidays + 1M, mAP(%) 77.82 - - 75.06 79.4 69.0 - - - 77.0 42.3
Query time (s) 0.868 0.749 0.749 - - 1.413 - 0.145 - - 0.65
Memory cost (GB) 0.36 - - 0.076 - 6.1 - 6.1 22.35 - -
[10] H. Jegou, M. Douze, and C. Schmid. On the burstiness of visual retrieval with large vocabularies and fast spatial matching. In Proceedings
elements. In Proceedings of the IEEE Conference on Computer Vision of the IEEE Conference on Computer Vision and Pattern Recognition,
and Pattern Recognition, 2009. 2007.
[11] L. Zheng, S. Wang, and Q. Tian. Coupled Binary Embedding for Large- [23] J. Sivic, and A. Zisserman. Video Google: a text retrieval approach
Scale Image Retrieval. IEEE Transactions on Image Processing, vol.23, to object matching in videos. In Proceedings of IEEE International
no.8, pp.3368-3380, 2014. Conference on Computer Vision, 2003.
[12] D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool. Hello [24] L. Zheng, S. Wang, Z. Liu, and Q. Tian. Lp-norm Idf for Large Scale
neighbor: accurate object retrieval with k-reciprocal nearest neighbors. Image Search. In Proceedings of the IEEE Conference on Computer
In Proceedings of the IEEE Conference on Computer Vision and Pattern Vision and Pattern Recognition, 2013.
Recognition, 2011. [25] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak
[13] D. Qin, C. Wengert, and L. Van Gool. Query adaptive similarity for geometric consistency for large scale image search. In Proceedings of
large scale object retrieval. In Proceedings of the IEEE Conference on the IEEE European Conference on Computer Vision, 2008.
Computer Vision and Pattern Recognition, 2013. [26] J. Philbin, O. Chum, M. Isard, J. Sivic, and Zisserman, A. Lost in
[14] L. Zheng, S. Wang, Z. Liu, and Q. Tian. Packing and padding: Coupled quantization: Improving particular object retrieval in large scale image
multi-index for accurate image retrieval. In Proceedings of the IEEE databases. In Proceedings of the IEEE Conference on Computer Vision
Conference on Computer Vision and Pattern Recognition, 2015. and Pattern Recognition, 2008.
[15] C. Deng, R. Ji, W. Liu, D. Tao, and X.Gao. Visual Reranking through [27] J. Philbin, O. Chum, M. Isard, J. Sivic, and Zisserman, A. Object
Weakly Supervised Multi-Graph Learning. In Proceedings of IEEE retrieval with large vocabularies and fast spatial matching. In Proceedings
International Conference on Computer Vision, 2013. of the IEEE Conference on Computer Vision and Pattern Recognition,
[16] L. Zheng, S. Wang, L. Tian, F. He, Z. Liu, and Q. Tian. Query- 2007.
Adaptive Late Fusion for Image Search and Person Re-identification. [28] C. Wengert, M. Douze, H. Jegou. Bag-of-colors for improved image
In Proceedings of the IEEE Conference on Computer Vision and Pattern search. In Proceedings of ACM Multimedia, 2011.
Recognition, 2015. [29] S. Zhang, M. Yang, X. Wang, Y. Lin, and Q. Tian. Semantic-aware
[17] S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas. Query Co-indexing for Near-duplicate Image Retrieval. In Proceedings of the
specific fusion for image retrieval. In Proceedings of the IEEE European International Conference on Computer Vision, 2013.
Conference on Computer Vision, 2012. [30] M. Douze, A. Ramisa, and C. Schmid. Combining attributes and
[18] D. Nister, H. Stewenius. Scalable recognition with a vocabulary tree. Fisher vectors for efficient image retrieval. In Proceedings of the IEEE
In Proceedings of the IEEE Conference on Computer Vision and Pattern Conference on Computer Vision and Pattern Recognition, 2011.
Recognition, 2006. [31] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features
[19] D.G. Lowe. Distinctive image features from scale-invariant keypoints. off-the-shelf: an astounding baseline for recognition. In Proceedings
International journal of computer vision, vol.60, no.2, pp.91-110, 2004. of the IEEE Conference on Computer Vision and Pattern Recognition
[20] K. Mikolajczyk, C. Schmid. Scale affine invariant interest point Workshops, 2014.
detectors. International journal of computer vision, vol.60, no.1, pp.63- [32] L. Zheng, S. Wang, J. Wang, and Q. Tian. Accurate image search with
86, 2004. multi-scale contextual evidences. In International Journal of Computer
[21] R. Arandjelovic, and A. Zisserman. Three things everyone should know Vision, vol.120, no.1, pp.1-13, 2016.
to improve object retrieval. In Proceedings of the IEEE Conference on [33] L. Zheng, S. Wang, and Q. Tian. Lp-Norm IDF for Scalable Image
Computer Vision and Pattern Recognition, 2012. Retrieval. In IEEE Transactions on Image Processing, vol.23, no.8,
[22] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object pp.3604-3617, 2014.
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE
Transactions on Image Processing
[34] L. Zheng, Y. Yang, and Q. Tian. SIFT Meets CNN: A Decade Survey Constraints. In European Conference on Computer Vision, 2016.
of Instance Retrieval. In ArXiv:1608.01807, 2016. [58] D. Li, J. B. Huang, Y.L. Li, S. Wang, and M. H. Yang. Weakly
[35] L. Zheng, S. Wang, Z. Liu, and Q. Tian. Fast image retrieval: query Supervised Object Localization with Progressive Domain Adaptation. In
pruning and early termination. In IEEE Transactions on Multimedia, IEEE Conference on Computer Vision and Pattern Recognition, 2016.
vol.17, no.5, pp.648-659, 2015. Ziqiong Liu received the bachelor degree in In-
[36] A. Oliva and A. Torralba. A holistic representation of the spatial formation Engineering from Southeast University,
envelope. International journal of computer vision, vol.42, no.3, pp.145- Nanjing, China, in 2011. She is currently pursu-
175, 2001. ing the Ph.D. degree in Electronic Engineering of
[37] M. Douze, H. Jegou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Tsinghua University, Beijing, China. Her current
Evaluation of gist descriptors for web-scale image search. In Proceedings research interests include image/video processing
of the ACM International Conference on Image and Video Retrieval, and large scale multimedia retrieval.
2009.
[38] Y. Weiss, A. B. Torralba, and R. Fergus. Spectral hashing In NIPS,
2008.
[39] H. Jegou and M. Douze and C. Schmid. Product quantization for nearest
neighbor search. IEEE Transaction on Pattern Analysis and Machine
Intelligence, vol.33, no.1, pp.117-128, 2011.
[40] L. Pauleve, H. Jegou, L. Amsaleg. Locality sensitive hashing: A
comparison of hash function types and querying mechanisms. Pattern
Recognit. Lett., vol.31, no.11, pp.1348-1358, 2010. Shengjin Wang received the B.E.degree from Ts-
[41] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: inghua University, China, in 1985 and the Ph.D.
Automatic query expansion with a generative feature model for object degree from the Tokyo Institute of Technology,
retrieval. In Proceedings of IEEE International Conference on Computer Tokyo, Japan, in1997. From May 1997 to August
Vision, 2007. 2003, he was a member of the research staff in
[42] L. Xie, Q. Tian, W. Zhou, and B.Zhang. Fast and accurate near-duplicate the Internet System Research Laboratories, NEC
image search with affinity propagation on the ImageWeb. Computer Corporation, Japan. Since September 2003, he has
Vision and Image Understanding, vol.124, pp.31-41, 2014. been a Professor with the Department of Electronic
[43] L. Xie, Q. Tian, W. Zhou, and B.Zhang. Heterogeneous Graph Engineering, Tsinghua University. He has published
Propagation for Large-Scale Web Image Search. IEEE Transactions on more than 80 papers on image processing, computer
Image Processing , vol.24, no.11, pp.4287-4298, 2015. vision, and pattern recognition. He is the holder of
[44] C. Huang, Y.Dong, H. Bai, L. Wang, N. Zhao, S. Cen, and J. Zhao. An ten patents. His current research interests include image processing, computer
efficient graph-based visual reranking. In IEEE ICASSP , 2013. vision, video surveillance, and pattern recognition.
[45] W. Liu, Y. G. Jiang, J. Luo, and S. F. Chang. Noise resistant graph
ranking for improved web image search. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2011.
[46] W. H. Hsu, L. S. Kennedy, and S. F. Chang. Video search reranking
through random walk over document-level context graph. In Proceedings Liang Zheng received the Ph.D degree in Electronic
of ACM International conference on Multimedia, 2007. Engineering from Tsinghua University, China, in
[47] S. C. Hoi, W. Liu, and S. F. Chang. Semi-supervised distance metric 2015, and the B.E. degree in Life Science from
learning for collaborative image retrieval. In Proceedings of the IEEE Tsinghua University, China, in 2010. He was a
Conference on Computer Vision and Pattern Recognition, 2008. postdoc researcher in University of Texas at San
[48] Y. Jia. Caffe: An open source convolutional architecture for fast feature Antonio, USA. He is currently a postdoc researcher
embedding. http://caffe.berkeleyvision.org/, 2013. in Quantum Computation and Intelligent Systems,
[49] L. Page, S. Brin, R. Motwani,Winograd, T. The PageRank citation University of Technology Sydney, Australia. His
ranking: Bringing order to the web, 1999. research interests include image retrieval, classifica-
[50] J M. Kleinberg Authoritative sources in a hyperlinked environment. tion, and person re-identification.
Journal of the ACM, vol.46, no.5, pp.604-632, 1999.
[51] Z. Liu, S. Wang, L. Zheng, and Q. Tian. Visual reranking with improved
image graph. In Proceedings of IEEE International Conference on
Acoustics, Speech and Signal Processing , 2014.
[52] S. Zhang, M. Yang, T. Cour, K. Yu, D. N. Metaxas Query Specific Rank
Fusion for Image Retrieval. IEEE Transactions on Pattern Analysis and Qi Tian (SM04) received the Ph.D. degree in
Machine Intelligence, vol. 37, no. 4, pp. 803-815, 2015. electrical and computer engineering from the Uni-
[53] A. Babenko, A. Slesarev, A. Chigorin, A. and V. Lempitsky. Neural versity of Illinois, Urbana Champaign in 2002. He is
codes for image retrieval. In Proceedings of the IEEE European currently a Professor in the Department of Computer
Conference on Computer Vision, 2014. Science at the University of Texas at San Antonio
[54] H. Jegou, C. Schmid, H. Harzallah, and J. Verbeek. Accurate image (UTSA). Dr. Tians research interests include multi-
search using the contextual dissimilarity measure.. IEEE Transactions media information retrieval and computer vision. He
on Pattern Analysis and Machine Intelligence , vol. 32, no. 1, pp. 2-11, has been serving as Program Chairs, Session Chairs,
2010. Organization Committee Members and TPC for over
[55] L. Xie, R. Hong, B. Zhang, and Q. Tian. Image classification and 120 IEEE and ACM Conferences including ACM
retrieval are one. In ACM International Conference on Multimedia Multimedia, SIGIR, ICCV, ICASSP, etc. He is the
Retrieval, 2015. Guest co-Editors of IEEE Transactions on Multimedia, Journal of Computer
[56] M. J. Huiskes, M. S. Lew. The MIR Flickr Retrieval Evaluation. In ACM Vision and Image Understanding, ACM Transactions on Intelligent Systems
International Conference on Multimedia Information Retrieval, 2008. and Technology, and EURASIP Journal on Advances in Signal Processing
[57] D. Li, W. C. Hung, J. B. Huang, S. Wang, N. Ahuja, and M. H. Yang. and is the Associate Editor of IEEE Transactions on Circuits and Systems for
Unsupervised Visual Representation Learning by Graph-based Consistent Video Technology and in the Editorial Board of Journal of Multimedia.
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.