Anda di halaman 1dari 6

2010 14th International Conference Information Visualisation Information Visualisation

YACBIR Yet Another Content Based Image Retrieval system


Samy Ait-Aoudia1, Ramdane Mahiou1, Billel Benzaid2
1

ESI - Ecole nationale Suprieure en Informatique, BP 68M, Oued-Smar 16270 Algiers, Algeria s_ait_aoudia@esi.dz, r_mahiou@esi.dz 2 Universit de Bourgogne, B.P. 47870, 21078 Dijon Cedex, France. billel_benzaid@etu.u-bourgogne.fr

Abstract. Vision is central in human perception. Images are everywhere. Real life applications produce and use huge amounts of different types images. Retrieving an image having some characteristics in a big database is a crucial task. We need then mechanisms for indexing and retrieving images. CBIR (Content Based Image Retrieval) systems perform these tasks by indexing images using the physical characteristics automatically extracted and searching by an image query. We will present a CBIR system named YACBIR (Yet Another CBIR) that combines several properties (color, texture and points of interest) extracted automatically to index and retrieve images. Keywords. Indexing Images, Retrieving Images, CBIR (Content Based Image Retrieval) system, Similarity measure. 1. INTRODUCTION

to describe the same concept. Another problem is the image polysemy as stated by Sara Shatford [12] the delight and frustration of pictorial ressources is that a picture can mean different things to different people . Content indexing can be alternative or complementary to textual indexing. Searching an image by visual content is central to CBIR (Content Based Image Retrieval) systems [1-3,5,7-11,13]. In CBIR systems, searching a collection of images for a specific image is made by an image query. The search engine extract the visual characteristics of the image (texture, color, form, ...) and search for similar images in the database. We will give hereafter some examples of CBIR systems, both commercial systems and research systems with demonstration versions available. QBICTM (Query By Image Content) [4] is know a product of IBM corporation1. QBIC system allows queries of large image databases based on visual content properties such as color percentages, color layout, and textures occurring in the images. VisualSEEk2 is a joint spatial-feature image search engine developed at Columbia university. VisualSEEk finds the images that contain the most similar arrangements of similar regions. In the indexing phase, the system automatically extracts and indexes salient color regions from the images. This paper is organized as follows. We give in section 2 the general architecture of a CBIR system. In section 3, concepts used by our CBIR system named YACBIR are given. Experimental results on sample datasets are given in section 4. Section 5 gives conclusions. 2. CBIR SYSTEM

Real life applications produce and use huge amounts of different types images. The following examples give some classical operations: retrieve a medical image having a pathological aspect [6], electronic commerce, identifying in forensics, Retrieving an image having some characteristics in a big database is a crucial task. We need then mechanisms for indexing and retrieving images. Searching for an image among a collection of images can be done by different approaches. Classical methods use only keyword indexing. Some search engines allow content-based image retrieval based on visual features. Textual indexing consists in associating words to a given image. This allows the retrieval of images based on textual queries. Well-known search engines such as Google Images or Yahoo Images Search use textual information to retrieve images from large collections of images. Searching for plane photos with such systems is done by introducing the keyword plane in the search box. For example, Yahoo Images Search returns 9,885,401 images for the query plane. Filters on color and image size can be used to narrow the results. However this type of indexation induces several problems. The person that indexes images and the enduser that search for an image can chose different terms
1550-6037/10 $26.00 2010 IEEE DOI 10.1109/IV.2010.83 570 560

A Content Based Image Retrieval system generally consists in two main phases as illustrated by Fig. 1. An indexation phase made off-line and a retrieval phase made on-line. Images are indexed using
1 2

http://wwwqbic.almaden.ibm.com/ http://www.ee.columbia.edu/ln/dvmm/researchProjects/MultimediaI ndexing/VisualSEEk/VisualSEEk.htm

the physical characteristics (color, texture, shape, ) of each image in the database. These descriptors are extracted automatically from the image content. The query is an image example. The results are images from the database similar to the query image according to predefined criteria. Choosing good indexes is thus a very important matter.

3.2. Texture For the texture characteristic four descriptors are used that are: contrast, entropy, energy and inverse differential moment. We have used the Euclidian distance to measure the similarity between the image request r and an image i. The similarity measure is given by :

S (Vr ,Vi ) =
where

(V (i) V (i ) )
r i i =1

Vr = {Vr ( i ) , 1 i 4} et {Vi ( i ) , 1 i 4}
represents the selected texture descriptors of image r and image i respectively. The value 0 for this similarity measure means that the textures of the two images are analogous. This measure is normalized in N(Vr,Vi) to have a similarity measure varying between 0 and 1. To have the same meaning as the color similarity measure, we will use the texture descriptor given below: ST=(1- N(Vr,Vi)) The value 1 means that the two images have similar texture. 3.3. Points of interest The points of interest analyzer is based on the Harris detector. The Harris detector gives the points of the image that present brutal change in the contours directions. These points are computed using the Harris matrix (auto-correlation matrix) of the i image in the neighborhood of the considered pixel. If P1, P2 are two points of interest from two images and V1, V2 their invariant vectors, the Mahalanobis distance is given by :

Figure 1. CBIR architecture. 3. YACBIR

The YACBIR system combine three characteristics of an image to compute a weighted similarity measure. The characteristics of the image are the color, texture and points of interest. Color and texture characteristics used are global while points of interest are shape local characteristics. 3.1. Color The color characteristic is widely used in generalist CBIR systems. We have chosen HSV as colorimetric space. The indexer module quantifies colors and creates histogram specific to each image. The similarity measure will be made on histograms. The similarity between histogram H I req of the query image R and histogram H [ I cand ] of a candidate image I is given by :
Inter H I req , H [ I cand ] = min H I req (c), H [ I cand ] (c )
c

Dm (V1 ,V2 ) =

(V1 V2 )

M 1 (V1 V2 )

The number of correct matching points between two images will quantify this similarity measure. If this value is low, we have a bad similarity. If this value is high relatively to the total number of interest points, we have a good similarity. This similarity measure is also normalized to have values between 0 and 1. We will note this similarity measure Ss. 3.4. Similarity Measure The similarity measure used in YACBIR is a sum of weighted color, texture and points of interest (shape) similarity measures. This similarity measure is given by: S = .SC + .ST + .Ss with (+ + )=1

H I req (c ) is number of

pixels from image

I req having the

c color.

To have a score between 0 and 1, this measure is divided by the number of pixels in the image to yield the color similarity SC. The value 1 means that the two images are similar.

571 561

The choice of these parameters depends on the query image. They can also be set automatically based on general description of the image. 4. EXPERIMENTAL RESULTS

similarity coefficient. The second test example (shown in Fig. 3) search for an image representing a man in the sand desert. The first similar images are related to the sand desert. 4.2. Images from COIL database The following test examples are made on images taken from the COIL (Columbia Object Image Library) database available at Columbia university (http://www.cs.columbia.edu/CAVE/software/softlib/c oil-100.php). This database contains 7200 color images of one hundred 3D objects in 72 different positions (5 rotation). Some objects of the COIL database are presented in Fig. 4. The first example search for COIL cat. The first similar images concern all the COIL cat in different positions (Fig. 5). The second example search for COIL car. The first similar images concern all the COIL cars (Fig. 6).

The evaluation of the YACBIR system is made by issuing image query to retrieve similar images in various database images. The source image is given in top left in all the examples below. Its similarity measure is naturally equal to 1.0 (similarity with itself). The images resulting from each query are given besides the image source. They are sorted downward by the similarity coefficient. 4.1. Images from Alamy database The following test examples are made on images taken from the Alamy database (http://www.alamy.com). The first example (given in Fig. 2) search for an image representing a yellow flower. The first similar images are given with a high

Figure 2. A yellow flower


572 562

1.0

Figure 3. A man in the sand desert.

Figure 4. COIL database 3D objects.


573 563

Figure 5. Rotating COIL cat.

Figure 6. Rotating COIL car.


574 564

5.

CONCLUSION

CBIR systems are various and diverse. There is a variety of physical characteristics used to index images. A system can use region histogram while another uses color coherence vector. For some systems there is no available details. The collection of images used in the tests can influence the results. With a given CBIR system, searching for example for a cat image in a database containing only dogs, always yields images of dogs. This paper attempts to evaluate the performance of the YACBIR system on sample datasets of images. The system gives good results on the tests conducted. Further tests must be conducted on various and large databases to have a more accurate evaluation. The indexation technique is a crucial part in a CBIR system. Images can be indexed using a wide variety of attributes concerning color (color moments, color coherence vector, dominant colors, ), texture (edge statistics, random field decomposition, local binary patterns) and shape (elastic models, bounding boxes, template matching, ). Evaluation of efficiency must also be done by using or adding other characteristics related to color, texture and shape in the YACBIR system. Comparison with other CBIR systems must be conducted on the same data to have an impartial judgment. To have a more powerful and efficient retrieval system for image and multimedia databases, content based queries must be combined with text and keyword predicates. 6. REFERENCES

[4] M. Flickner et al., Query by Image and Video Content: The QBIC System Computer, Sept. 1995, pp. 23-32. [5] Y. Liu, D. Zhang, Guojun Lu, and W.Y. Ma, A survey of content-based image retrieval with highlevel semantics Pattern Recognition, Volume 40, Issue 1, January 2007, Pages 262-282. [6] W.R. Hersh, H. Mller, J.R. Jensen, J. Yang, P.N. Gorman and P. Ruch, Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection, Journal of the American Medical Informatics Association Volume 13 Number 5 Sep/ Oct 2006. [7] N.R. Howe, Analysis and representations for automatic comparison, classification and retrieval of digital images, PhD Thesis, Cornell University, Ithaca, NY, USA, May 2001. [8] M.S. Lew, N. Sebe, C. Djeraba, and R. Jain, Content-based Multimedia Information Retrieval: State of the Art and Challenges, ACM Transactions on Multimedia Computing, Communications, and Applications, Feb. 2006. [9] J. Li, and J.Z. Wang, Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 9, September 2003. [10] H. Mller, W. Mller, D. Mc.Squire, S. MarchandMaillet and T. Pun, Performance evaluation in content-based image retrieval: overview and proposals, Pattern Recognition Letters Volume 22, Issue 5, April 2001, Pages 593-601. [11] S.K. Saha, A.K. Das and B. Chanda, CBIR using Perception based Texture and Colour Measures, ICPR04 Proceedings of the Pattern Recognition, 17th International Conference Volume 2 , Cambridge UK. [12] S. Shatford, Analyzing the subject of a picture : a theoretical approach, Cataloging & Classifcation Quarterly Volume 6, Issue 3 March 1986, pp 3962. [13] A.W.M. Smeulders, M Worring, S Santini, A Gupta and R Jain., Content-Based Image Retrieval at the End of the Early Years, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, 2000, pp. 1349-1380.

[1] G. Carneiro, A.B. Chan, P.J. Moreno and N. Vasconcelos., Supervised Learning of Semantic Classes for Image Annotation and Retrieval IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, March 2007, pp. 394-410. [2] R. Datta, D. Joshi, J. Li, and J.Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, Vol. 40, No. 2, Article 5, Publication date: April 2008. [3] R. Datta, J. Li and J. Wang, Content-based image retrieval: approaches and trends of the new age, Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, Singapore, 2005.

575 565