Anda di halaman 1dari 4

The 19th Korea -Japan Joint Workshop on Frontiers of Computer Vision

Intensity Comparison Based Compact Descriptor for Mobile Visual Search


Sang-il Na, Keun-dong Lee, Seung-jae Lee, Sung-kwan Je, and Weon-geun Oh
Creative Content Research Lab. ETRI Daejeon, Rep. of Korea {sina, zaCUIT, seungjlee, skj, owg}@etri.re.kr
Abstract- In this paper, we proposed intensity comparison based compact descriptor for mobile visual search. For practical mobile applications, the low complexity and the descriptor size are more preferable, and many algorithms such as SURF, CHoG, and PCA-SIFT have been proposed. However, these approaches focused on not the feature description but the extraction time and the size of the feature. This paper suggests feature description method based on simple intensity comparison with considering descriptor size and extraction speed. Experimental results show that the proposed method has comparable performance to SURF with similar complexity and 20 times much smaller size. Keywords-feature descriptor, image matching
I.
INTRODUCTION

mobile device. (a) shows the server side solution, (b) shows the extraction doing on device and the local features are send over the network and (c) shows on device scenario [14].

Feature based object recognition has been getting much more attention before Scale Invariant Feature Transform (SIFT) had been proposed and its applications and the performance were reported in the literatures [1]. After SIFT, many modification were proposed and efficient searching structures and performance comparisons were followed [2][13]. With the popularity of mobile network and devices, the compactness and the low complexity of feature extraction are much more considered to design feature descriptor. For mobile applications with feature descriptor, the algorithm should satisfy the following properties: Robustness: the visual descriptor is robust against different lightening conditions and partial occlusion by moving objects e.g. pedestrians and cars. Discriminable: if two image patches are different part of object, the feature descriptor from them should be significantly different. Fast extraction: the computing power is limited on the mobile device; the algorithm has a low complexity. Compactness: When the local features are sent over a network, the system latency can be reduced by sending fewer bits resulting from compact local feature. Also when the DB is stored in the device, the amount of images can be increased by using compact descriptor. Identify applicable sponsor/s here. If no sponsors, delete Figure 1 shows the scenario to use feature descriptor on

Fig. 1. Mobile visual search pipeline. (a) every thing is done on server, (b) descriptor extract on device and matching on server (c) every thing is done on mobile device

In case of (b) and (c), the extraction speed and compactness are a matter of concernment. In the previous research, SURF archive the low complexity but its descriptor size is too large to use in mobile device. PCASIFT and CRoG have compact descriptor size but these methods need to post processing after raw descriptor extraction [15]-[16]. In this paper, we present local descriptor based on comparison. The proposed descriptor uses an average intensity value comparison that is robust to various conditions.

This research was supported by the ICT Standardization program of MKE(The Ministry of Knowledge Economy)

978-1 -4673 -5621 -3/13/$31.00 02013 IEEE

103

The 19th Korea -Japan Joint Workshop on Frontiers of Computer Vision

This paper is organized as follows. Section 2 explains the proposed feature descriptor. Section 3 describes experimental condition and result, and finally concludes the paper. II.
PROPOSED METHOD

comparison. For increase discriminability, build different type of comparison pattern. The detailed descriptor extraction is described follows. B. Descriptor Extraction Fig. 2 shows the proposed feature descriptor extraction flow. The goal of a feature descriptor is to robustly capture salient information from a canonical image patch. The image patch means the local region which extracted by feature point extractor. When the feature point extractor extracting the region, the region is normalize to their scale and main orientation.

This section describes the proposed comparison based compact descriptor extraction method.

Fig. 2. The proposed descriptor overview

A. Assumptions Generally, the relativity of the luminance component are maintained in the local region after performing modification in pixel domain. The modification of pixel value can be approximated by equation (1).

Fig. 3. The sub-block division

It (x, y ) = aI (~,

Y) + ~

(1)

Where ]t(x,y) is the modified pixel value in the (x,y) location and (x, y) is the corresponding location of (x, y) in the modified image, a is contrast change and ~ is brightness change. In the local feature extraction, the geometric modifications like scaling, rotation and translation can be compensated by feature detector. Therefore, we design the feature descriptor without considering the geometric modification. In this reason, (x,y) and (x,y) would be same. If the modification is linear, the equation (1) can be approximated by the following equations:

The image patch is dividing into 4x4 blocks and generates comparison patterns as shown in fig. 3. The comparison patterns are generated using three sub-block values and the different values inside a sub-block. The three sub-block values are the average, and the x and y directions difference values inside the sub-block. X direction difference is the difference value of left half pixel values with right half pixel values. Y direction difference calculate as same way with x direction difference using bottom half pixel values and top half pixel values. We converted these values into a binary descriptor using comparison. For each comparison we carried out binarization, so if one block is corrupted by noise, it influences just 1 bit for each feature. The outline for the procedure for generating the values is as follows: X directional difference and Y directional difference: If the value is positive, set to '1' else set to '0' for each value. These values show local characteristic on image patch. The comparison pairs make use of point symmetry from the origin. For example, block O's value is compare with block 15's value. For this comparison, average, x directional difference and y directional difference are used. We compare the large regions like the sum of left side sub-blocks compare with sum of right side sub-blocks.

]t(x,y) == aI(x,y) + fJ

(2)

It(x, y)

These equations represent brightness changes, contrast changes, and convolutional filtering, respectively. Where a and
R

]=-2 J=-2

I I
2
M.

O'.LjI(x,y)
M

(3)

tJ

are same with equation (1), M is filter size and a 1,J . . is

magnitude factor. Above equations means the relationships of the different positions do not change even if the modifications occur in the local region. However, some modifications such as noise addition do not follow this. In this case, the block average is used and the rule is maintained. In this paper, we proposed descriptor by above assumptions. Proposed method builds the binary descriptor by local intensity

Use above method, we make 99 bits descriptor for each patch image. Descriptor matching The hamming distance is used to measure the similarity between reference descriptor and query descriptor and its value shows how many bits are different to the reference one.

c.

104

The 19th Korea -Japan Joint Workshop on Frontiers of Computer Vision

The number of bit errors between the descriptors from different local regions will then have a binomial distribution B(n,p) where n is equal to the number of bits extracted and p is the probability that a '0' or '1' bit is extracted. If n is sufficiently large, the binomial distribution can be approximated as a normal distribution. Therefore, its mean is np and the standard deviation is ~np(l- p). From this it can be deduced that the bit error rate (BER) has a normal distribution with mean u == p and a standard deviation of a distribution,
=

~ p(l- p) / n . For the approximated normal

NeLl,o-),

the false the false alarm rate PFA for

BER is given in (4) [17]. 1 u-T FFA = "2 erj c( J2a)

(4)

For deciding the matching or not, we use two criteria. If the distance value is lower than predefined threshold, we assume the descriptors are matched. The threshold value is very strict which value set to PFA lower than 10- . The other one use usual way to decide the matching on local feature. Nearest neighbor distance ratio (NNDR) is one of popular method to find the matching point. We also use this method. III.
EXPERIMENTAL RESULT
12

Fig. 5. Example of DB which used for retrieval

To evaluate the proposed algorithm, SURF was used to detect feature points and get image patches. We designed two different experiments as follows.

First one is pair-wise matching experiment. In this experiment, True positive rate (TPR) and false positive rate (FPR) were calculated to measure the performance. Equation (5) and (6) shows TPR and FPR. In this experiment, the image pair successfully get a homography matrix by RANSAC[18], we assume the match.

TPR == FPR ==

#of _match Total _ # _ of _ matching _ pairs #of _match Total _ # _ of _ non _ matching _ pairs

(5)

(6)

For this test, we use 5 category of Stanford DB [19] which are CD covers, DVD covers, Book covers, Business cards and Text documents. In this experiment, we use 600 matching pairs for each category and 6,000 non matching pairs for over-all categories. Fig. 4 shows example of the DB. Second experiment is retrieval. For this test, we build 1,500 reference images DB and 1,900 query images DB. The reference and query images were captured by DSLR and mobile devices, respectively. Example images of DB are shown in fig. 5. For efficient search, KD-tree was used to build searching structure and best-bin- first (BBF) [20] was used to find approximate nearest neighbor. In this test, we checked top result by query and if they are same object, we assume the success. Table 1 and Table 2 shows pair-wise matching and retrieval result, respectively. As shown in the results, pair-wise matching performance of TPR, FPR and retrieval performance in terms of success ratio of proposed method was comparable to SURF. OpenSURF[21] implementation was used for this experiment.

Fig. 4. Example of Stanford DB

105

The 19th Korea -Japan Joint Workshop on Frontiers of Computer Vision

Table 3 shows average descriptor size. Proposed descriptor size is 20 times smaller than SURF as shown. The descriptor size includes position information for each feature, so actual descriptor size is much more compact than SURF.
TABLE I. TEST RESULT FORPAIR-WISE MATCHING Printed document 0.592 0.567

[3]

Krystian Mikolajczyk and Cordelia Schmid "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 27, pp 1615--1630, 2005 K. Mikolajczyk and C. Schmid, "An Affine Invariant Interest Point Detector," Proc. Seventh European Conf. Computer Vision, pp. 128-142, 2002. T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, July 2002. D. Nistcr and H. Stewenius, "Scalable recognition with a vocabulary tree," in Proc. IEEE Conf. Computer Vision and Pattern Recognition O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, "Total recall: Automatic query expansion with a generative feature model for object retrieval," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, 2007. T. Yeh, J. J. Lee, and T. J. Darrell, "Adaptive vocabulary forests for dynamic indexing and category learning," in Proc. IEEE Int. Conf. Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007. H. Jegou, M. Douze, and C. Schmid, "Hamming embedding and weak geometric consistency for large scale image search," in Proc. European Conf. Computer Vision (ECCV), Berlin, Heidelberg, 2008.

[4]

[5] Book cover 0.933 0.898

I
I I

SURF Proposed

TPR FPR TPR FPR

DVD cover 0.94 0.925

I
I I

DB CD cover 0.865 0.0118 0.862 0.0112

I
I I

Busines s card 0.76 0.845

I
I I

[6] [7]

TABLE II. Success-rate TABLE III. Size Of Descriptor

TESTRESULTFORRETRIEVAL(%) SURF 91.50 Proposed 93.93

[8]

[9]

AVERAGE DESCRIPTOR SIZE (KBYTE) SURF 2.35 Proposed 45.40

TABLE IV. Time

PAIR-WISE MATCHING TIME (MSEC) SURF 420.4 Proposed 259.3

[10] J. Philbin, o. Chum, M. Isard, J. Sivic, and A. Zisserman, "Lost in quantization Improving particular object retrieval in large scale image databases," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, June 2008. [11] H. Jegou, M. Do uze, and C. Schmid, "Improving bag-of-features for large scale image search," Int. J. Comput. Vis., vol. 87, no. 3, pp. 316336, Feb. 2010. [12] H. Jegou, M. Dou ze, C. Schmid, and P. Perez, "Aggregating local descriptors into a compact image representation," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, CA, June 2010. [13] D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, "Residual enhanced visual vector as a compact signature for mobile visual search", Signal Processing. [14] Miroslaw Bober, Giovanni Cordara and Yuriy A. Reznik, "Information on MPEG exploration on CCompact Descriptors for Visual Search", 2nd Workshop on Mobile Visual Search, January 2010. (MVS workshop) [15] Ke, Y., and Sukthankar, R., "PCA-SIFT: A More Distinctive Representation for Local Image Descriptors", Computer Vision and Pattern Recognition, 2004. [16] V. Chandrasekhar, G. Takacs, D. M. Chen, S. S. Tsai, R. Grzeszczuk, and B. Girod, "CHoG: Compressed histogram of gradients-A low bit rate feature descriptor," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Miami, FL, June 2009. [17] Jin S. Seo, Jaap Haitsma, Ton Kalker, Chang D. Yoo, "A robust image fingerprinting system using the Radon transform," Signal Processing: Image Communication, vol. 19, pp. 325-339,2004 [18] Martin A. Fischler and Robert C. Bolles (June 1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography". Comm. of the ACM 24 (6): 381-395 [19] V. Chandrasekhar, D. Chen, S. S. Tsai, N. M. Cheung, H. Chen, G. Takacs, Y. Reznik, R. Vedantham, R. Grzeszczuk, J. Bach, and B. Girod, "The Stanford mobile visual search dataset", Proceedings of ACM Multimedia Systems Conference, San Jose, California, February 2011. [20] Beis; Lowe, D. G. (1997). "Shape indexing using approximate nearestneighbour search in high-dimensional spaces". Conference on Computer Vision and Pattern Recognition. Puerto Rico. pp. 1000-1006. BBF [21] http://www.chrisevansdev.comJ

TABLE V. Extraction Time Searching Time

RETRIEVAL TIME (SEC) SURF 0.09 0.078 Proposed 0.07 0.015

Extraction and matching time for pair-wise matching and retrieval were shown in Table 4 and 5, respectively. Pair-wise matching time includes visual descriptor extraction and matching time. In the retrieval, the extraction time means query descriptor extraction time. As shown in the results, proposed method is efficient for feature extraction and matching. IV.
CONCULISIONS

In this paper, we propose new feature descriptor. The key idea of this paper is extracting the local feature descriptor by comparing local intensities. As shown in the experimental results, it brings fast extraction time and compact descriptor size with similar performance compared to previous research. In mobile visual search applications, extraction time and descriptor size are important issue, so proposed method is appropriate in mobile environment.
REFERENCES [1] [2] D. Lowe, "Distinctive image features from scale-invariant keypoints," Int. J. Comput. Vis., vol. 60, no. 2, pp. 91-110,2004. H. Bay, T. Tuytelaars, and L. Van Gool, "SURF: Speeded up robust features," in Proc. European Conf. Computer Vision (ECCV), Graz, Austria, May 2006.

106

Anda mungkin juga menyukai