Medical Image Retrieval Based On Latent Semantic Indexing

2008 International Conference on Computer Science and Software Engineering
Medical Image Retrieval Based on Latent Semantic

Indexing
Qin Chen, Xiaoying Tai, Baochuan Jiang, Gang Li, Jieyu Zhao
Institute of Information Science & Engineering
Ningbo University
Ningbo, Zhejiang, China
kokoyy97@yahoo.com.cn taixiaoying@nbu.edu.cn
AbstractTo improve the performance of content-based medical
image retrieval, herein an algorithm which makes use of latent
semantic indexing (LSI) technology on gastroscopic image
retrieval is proposed. First extract images color histogram and
color autocorrelogram of low-level features, and then use
normalizing, term weighting and singular value decomposition to
realize low-level features mapping into high-level semantic
features. In this way, the retrieval results will be more in
accordance with the query images semantic content. Based on
above idea, a prototype system which supports query by example
image is designed and implemented. The experimental results
according to the prototype system show that the approach
proposed in the paper is effective to gastroscopic image retrieval.
II.
A. Color histogram
This paper adopts one dimension color histogram based on
HSV color space. We can shift RGB color space into HSV
color space [2] to obtain h [0,360] , s [ 0,1] , v [0,1] .
Red is the main color of gastroscopic image [3] while
yellow and green are the color of gastric cancer cell. In the
converted HSV space, component h concentrates on [0, 100]
and [300,360], and s and v components distribution are
relatively more homogeneous. According to above-mentioned
characteristics, we quantize the h component into 16 ranks
nonuniformly, and quantize the s and v components into 4
ranks uniformly.
Keywords-color histogram; color autocorrelogram; latent

semantic indexing; singular value decomposition
I.
Cluster these three components after quantization. The

ultimate color after clustering is: C = 16H + 4S + V , where
C [0,255] and is an integer.
INTRODUCTION
As the development of modern information technology,

there are great amount of medical images generated every day.
How to use these images to help to diagnose is a very
important issue. Content-Based Medical Image Retrieval
(CBMIR) is the application of CBIR technology in medical
field. When content-based medical image retrieval technology
describes the images content, it is always extract images
characteristics of focus such as color, texture, shape and spatial
relation [1] to form images low-level feature vector as the
basis of making index and matching. Since there are certain
gaps between the description of these low-level features to
medical image focus and the description of doctors diagnose,
it is always can not get satisfied results directly use these lowlevel features as retrieval basis. Therefore, it is necessary to
find some kind of mapping relationship between images lowlevel visual features and high-level semantic information as to
let the retrieval results more in accordance with the doctors
diagnose. This paper in allusion to gastroscopic images, first
extract images color histogram and color autocorrelogram of
low-level features, and then use latent semantic indexing
technology to realize low-level features mapping into highlevel semantic features. The experimental results according to
the prototype system show that the approach proposed in the
paper is effective to gastroscopic image retrieval.
Then the color histogram vector can be expressed as:

H = {h[0], h[1], " , h[ 255]} , where h[i ] indicating the
percentage taken up by the pixels which color value is i after
quantization and clustering in the image.
B. Color correlogram [4]
A color correlogram expresses how the spatial correlation
of color changes with distance.
Let I be an n n image. The color in I are quantized into
m colors c1 , c 2 , ", cm . For a pixel p = ( x, y ) I , let I ( p )
denote its color. And the distance between pixels p1 = ( x1 , y1 )
and p2 = ( x 2 , y 2 ) as: p1 p 2 = max{x1 x2 , y1 y 2 }.
The correlogram of I is defined for i, j [m], k [d ] as:
c( k,c) ( I ) =
i
Pr [ p2 I c
p1I ci
p2 I
(1)
| p1 p 2 = k ]
To compute the correlogram, it needs to compute the

following formula:
The project is sponsored by the national natural science foundation of

China under the grant No. 60472099 and Ningbo Natural Science Foundation
of grant 2006A610017 and 2004A610004.
978-0-7695-3336-0/08 $25.00 2008 IEEE

DOI 10.1109/CSSE.2008.1457
LOW-LEVEL FEATURES
c(k,c) ( I ) = p1 I ci , p 2 I c j | p1 p 2 = k
i
561
(2)
The correlogram counts as: ( k ) ( I ) =

ci , c j
c(ik,c) j ( I )
weight gi : I ij = lij g i . Where, local weight is the significance of

the feature item wi in the image I j ; global weight is the
, where
8khci ( I )
.
The
denominator
is
the total number
hci ( I ) = n Pr [ p I Ci ]
significance of the feature item wi in the total image database.
pI
There are already many weighting method. Some

researchers indicate that local weight using logarithmic item
frequency and global weight using entropy will get the
optimum performance. Here, we adopt this method to
calculate:
of pixels at distance k from any pixel of color ci . In order to

reduce the space complexity, the autocorrelogram is proposed,
which captures spatial correlation between identical colors only
and is defined as: c( k ) ( I ) = c( ,kc) ( I ) . In our experiment, we
calculate the autocorrelogram by the distance of 1.
III.
Local weight (logarithmic item frequency): lij = log(1 + f ij ) .
LATENT SEMANTIC INDEXING [5]
Global weight (entropy): g = 1 +

i
In text retrieval field, LSI takes the SVD on the word-text

matrix, and gets the first maximal k singular values and their
corresponding singular vector to construct a new matrix to
approximatively express the word-text matrix. As the new
matrix has removed noise, reduced the original feature
dimension, it has more excellent retrieval performance thereby.
1
log M
fij
fij
F log F .
j =1
Where f ij indicate feature item wi s occurrence frequency

in the image I j , Fi indicate feature item wi s occurrence
frequency in the total image set.
C. Reduce matrixs rank using singular value decomposition
Reduce matrix A s rank using SVD has fairly good
mathematics character: be a given k , the k -rank
approximation of A is the minimum change of A .
It is a worth deep studying problem that how to better make

use of LSI technology on image retrieval field. Extend the LSI
technology to image retrieval field, and the word-text matrix is
corresponding to semantic-image matrix. This paper apply LSI
technology to gastroscopic images color histogram and color
autocorrelogram, and contrast the performance of normalizing,
term weighting and singular value decomposition before and
after. The results show that the method is indeed effective.
The definition of SVD of matrix A ( m n ) is as: A = U V T ,

where U is a m m orthogonality matrix, V is a n n
orthogonality matrix. The singular value of matrix A is
arrange according to descending ( 1 2 " r , where
r = rank ( A) ) as to form diagonal matrix ( m n ).
A. Normalization [6]
The purpose of normalization is to let each component of
feature vector get the same importance.
Suppose the foremost k left singular vector of U compose

matrix U k ( m k ), the foremost k right singular vector of V
compose matrix Vk ( n k ), and the maximum k singular values
compose diagonal matrix k ( k k ). Matrix Ak is defined as:
Ak = U k kVkT .
Suppose there are M images in the image database, and

each image has K dimension feature vector, so the m th
images feature vector can be set as: Vm = [Vm1 ," ,Vmk ," , VmK ] .
The all images K dimension feature vector will form a
matrix: v = [vmk ] ( m = 1,", M , k = 1,", K ) .
The rank of matrix Ak is k , and the above formula could be

regarded as the k -rank approximation matrix of matrix A .
Suppose the column vector vk is a Gauss progression, first

calculate its mean k and standard deviation k , and then
classify each value in the progression into [ 1, +1] range use
formula: v = vmk k . Generally, we normalize the value
mk
3 k
into [0,1] interval: vmk = vmk + 1 .
2
The decomposition could be viewed as Fig. 1:
m
Figure 1.
m
Ak
Uk
k
k
k
VkT
Reduce matrixs rank using singular value decomposition.
By reducing matrixs rank using SVD, it could remove

much noise. But if the rank is too small, it will lose important
information. How big the rank should be choose is a problem
which always decided by experience and experiment according
to different databases.
B. Term weighting
In text retrieval field, researchers always use term
weighting technology to set the index item different
significance so as to improve the performance of retrieval
system. Apply this technology on to image retrieval field,
suppose there are M images I1 , I 2 ," , I M , we extract K feature
items w1 , w2 ," , wK . In the image I j , the feature item wi s
D. Similarly metric
We use cosine distance to measure the distance between
query images semantic vector q and the images semantic
weight I ij can be the product of local weight lij and global
562
vector in the database. If semantic-image matrix A have

column vector a j , j = 1, 2," , d , the distance is as follows:
D(a j , q ) = 1 cos(a j , q ) = 1
IV.
aTj q
retrieval. We establish a prototype system and designed some

experiments based on it:
(1): Retrieval based on color histogram;
(2): Retrieval using normalization based on (1).
(3): Retrieval using term weighting based on (2).
(4): Retrieval using singular value decomposition based on
(3), trying different k to compare retrieval performance.
(5): Compare above results to the results of retrieval based
on color autocorrelogram;
(3)
aj q
LOW-LEVEL FEATURES MAPPING INTO HIGH-LEVEL

SEMANTIC
The difficulty of using LSI technology in image retrieval is

how to use images low-level features to replace word (term)
in the text retrieval. According to Section 2, we can see that the
color histogram describe the frequency of some color appears
in the image just as word frequency in the text retrieval. By
this way, we can use LSI technology onto color histogramimage matrix to implement image retrieval based on semantic.
System returns 15 images that are most similar with the

query image during every query. The experiment is provided:
Judge that if two images are similar or not is according to if
they have the same focus region or not.
As to system evaluation, we use the retrieval precision and
ranking measures (average-r, average-p) as parameters.
Suppose there are M images I1 , I 2 ," , I M in the image

database, the algorithm is as follows:
The retrieval result interface is show as Fig. 2.
(1) Calculate all images color histogram to form matrix

AN M . Where N is the dimension of color histogram.
(2) Normalize matrix AN M after transpose.
(3) Weight the result of (2).
(4) Take SVD on the result of (3), set appropriate k to
ignore redundancy data and noise, and then compose a new
k rank matrix.
(5) Retrieval image using cosine similarity distance.
(6) Use above (1)-(5) steps to color autocorrelogram as
well.
V.
Figure 2. One retrieve result interface.
In the experiment, we choose 30 images with cancer from

database as the query images, and then using above mentioned
techniques to retrieval, finally calculate the average value.
EXPERIMENTAL RESULTS AND DISCUSSIONS
In the experiment, we adopt 1345 gastroscope images, in

which 169 images with cancer and others not. By analyzing,
we find that intuitionist discrimination between with cancer
images and without cancer images are their color and color
spatial distribution. Therefore, we make use of LSI technology
based on color histogram and color autocorrelogram to
TABLE I.
When retrieval using SVD base on color histogram after

normalization and weighting, the average retrieval performance
changing as different k are show as Tab. . And Tab. is
based color autocorrelogram.
RETRIEVAL RESULTS OF COLOR HISTOGRAM USING LSI WITH DIFFERENT k
10
15
20
25
30
35
40
45
Precision
75.99%
82.88%
83.55%
82.00%
83.33%
83.99%
83.32%
83.77%
83.11%
Average-r
11.14
9.20
7.85
7.50
7.32
7.31
7.38
7.27
7.31
Average-p
0.810
0.884
0.886
0.895
0.899
0.900
0.898
0.900
0.899
50
60
80
100
120
140
160
200
256
82.66%
83.10%
83.11%
83.11%
83.11%
83.11%
83.11%
83.11%
83.11%
7.37
7.39
7.45
7.47
7.46
7.46
7.46
7.46
7.46
0.896
0.896
0.894
0.894
0.894
0.894
0.894
0.894
0.894
TABLE II.
RETRIEVAL RESULTS OF COLOR AUTOCORRELOGRAM USING LSI WITH DIFFERENT k

10
15
20
25
30
35
40
Precision
58.90%
83.11%
75.77%
73.78%
70.89%
68.23%
67.11%
67.11%
68.89%
45
Average-r
11.24
6.40
7.14
7.55
7.92
8.12
8.51
8.63
8.41
Average-p
0.741
0.919
0.880
0.859
0.837
0.821
0.809
0.802
0.808
50
60
80
100
120
140
160
200
256
68.45%
68.00%
67.34%
67.77%
67.12%
67.12%
67.12%
67.12%
67.12%
8.59
8.48
8.48
8.59
8.67
8.72
8.69
8.69
8.69
0.805
0.805
0.804
0.794
0.793
0.793
0.795
0.795
0.795
563
As to color histogram using LSI, when k is 30 the average

precision is getting the maximum 83.99%. And when k is
greater than 100, all the retrieval performance is keep still. It
could be explained that the singular values behind 100 are all
redundancy data and can be compressed away. There are some
noises in the data between 30th singular value and 100th
singular value, it leads to fluctuation with the retrieval
performance therefore. As to color autocorrelogram using LSI,
when k is 10 the average precision is getting the maximum
83.11%. And when k is greater than 140, all the retrieval
performance is keep still. We can observe by contrast that
when retrieval based on color histogram using LSI and color
autocorrelogram using LSI reached the maximum precision,
the k is different (the former is 30, the latter is 10), but the
maximum precision is some what contiguous (the former is
83.99%, the latter is 83.11%).
According to Tab. and Tab. , we can get the average

precision changing figure as different k based on color
histogram and color autocorrelogram using LSI (Fig. 3):
col or aut ocor r el ogr am
20
0
14
0
Dimension k
10
0
60
45
35
25
15
Precision
col or hi st ogr am
1
0. 8
0. 6
0. 4
0. 2
0
Figure 3. Precision of color histogram using LSI and color

autocorrelogram using LSI with different k .
We can observe from Tab. , Tab. and Figure 3 that

setting different k will have certain effect on retrieval results.
TABLE III.
All experiments results are show as Tab. and Fig. 4:
STATISTICS OF EXPERIMENT RESULTS
Color histogram
Precision
Raw data
Normalize
Normalize
Weighted
Normalize, Weighted,
SVD(k=30)
Raw data
Normalize
Normalize
Weighted
Normalize, Weighted,
SVD(k=10)
56.23%
70.21%
83.11%
83.99%
56.67%
47.78%
67.12%
83.11%
Average-r
12.86
8.01
7.46
7.31
9.74
12.34
8.69
6.40
Average-p
0.689
0.849
0.894
0.900
0.726
0.635
0.795
0.919
Pr eci si on
col or hi st ogr am
1
0. 8
0. 6
0. 4
0. 2
0
Color autocorrelogram
connected voluntarily together. As to gastroscopic images, by

using LSI, the semantic index items including cancer focus
connected together, and the items not including cancer focus
connected together, so the retrieval performance is improved.
col or aut ocor r el ogr am
VI.
Raw dat a
Nor mal i zed
Nor mal i zed, Wei ght ed
CONCLUSIONS
This paper in allusion to gastroscopic images, make use of

LSI technology to implement image retrieval which based on
its semantic information. The experimental results according to
the prototype system show that the approach proposed in the
paper could improve the retrieval performance greatly. But this
improving has a limit. How to break through this limit needs to
introduce other retrieval mechanism and technology, and it will
be the content of our next research.
Nor mal i zed, Wei ght ed wi t h

SVD
Figure 4. Precision contrast of retrieval results.
We can observe from Tab. and Fig. 4 that as to color

histogram, normalizing and term weighting are having
important effect on improving retrieval performance. And the
result of singular value decomposition based on that is not so
obvious. As to color autocorrelogram, the normalizing leads to
retrieval performance drop a lot, but term weighting and SVD
are having important effect on improving retrieval
performance. Besides, we can observe another phenomenon
from Fig. 4 that no matter color histogram or color
autocorrelogram, by using normalizing, term weighting and
SVD, the ultimate retrieval results both tend to an approximate
value. It could be explained that as to these two low-level
features using LSI technology, the final retrieval performance
tend to a fixed extremum. It also could be said that using LSI
technology could not improve retrieval performance infinitely;
it will be confined by image itself and other aspects affections.
But it is obvious that by using LSI technology, no matter
precision or other two measures, the retrieval performance is
greatly improved. So the LSI technology has the function of
being able to make the interrelated semantic index item be
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
564
Mustafa O, Ediz P. A color image segmentation approach for contentbased image retrieval. Pattern Recognition, 2007.40(4):1318-1325
Naoto K, Yasuo M. Database retrieval for similar images using ICA and
PCA bases. Engineering Applications of Artificial Intelligence,
2005.18(6):705-717
Fang YCZ, Bang TM, and Chuan KS. Endoscope diagnosis and
differential diagnosis map. LiaoNing Science and Technology
Publishing House,2003.7
Adam W, Peter Y. Content-based image retrieval using joint
correlograms. Multimedia Tools and Applications, 2007.34(2):239-248
Zhao R, Grosky W I. Negotiating the semantic gap: From feature maps
to semantic landscapes. Pattern Recognition, 2002, 35:593-600
Tai XY, Bei YE. Introduction to information retrieval technology.
BeiJingScience Press,2006

Medical Image Retrieval Based On Latent Semantic Indexing

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Medical Image Retrieval Based On Latent Semantic Indexing

Diunggah oleh

Hak Cipta:

Format Tersedia

2008 International Conference on Computer Science and Software Engineering

Medical Image Retrieval Based on Latent Semantic

Keywords-color histogram; color autocorrelogram; latent

Cluster these three components after quantization. The

As the development of modern information technology,

Then the color histogram vector can be expressed as:

To compute the correlogram, it needs to compute the

The project is sponsored by the national natural science foundation of

978-0-7695-3336-0/08 $25.00 2008 IEEE

The correlogram counts as: ( k ) ( I ) =

weight gi : I ij = lij g i . Where, local weight is the significance of

significance of the feature item wi in the total image database.

There are already many weighting method. Some

of pixels at distance k from any pixel of color ci . In order to

Local weight (logarithmic item frequency): lij = log(1 + f ij ) .

LATENT SEMANTIC INDEXING [5]

Global weight (entropy): g = 1 +

In text retrieval field, LSI takes the SVD on the word-text

Where f ij indicate feature item wi s occurrence frequency

It is a worth deep studying problem that how to better make

The definition of SVD of matrix A ( m n ) is as: A = U V T ,

Suppose the foremost k left singular vector of U compose

Suppose there are M images in the image database, and

The rank of matrix Ak is k , and the above formula could be

Suppose the column vector vk is a Gauss progression, first

The decomposition could be viewed as Fig. 1:

Reduce matrixs rank using singular value decomposition.

By reducing matrixs rank using SVD, it could remove

weight I ij can be the product of local weight lij and global

vector in the database. If semantic-image matrix A have

retrieval. We establish a prototype system and designed some

LOW-LEVEL FEATURES MAPPING INTO HIGH-LEVEL

The difficulty of using LSI technology in image retrieval is

System returns 15 images that are most similar with the

Suppose there are M images I1 , I 2 ," , I M in the image

The retrieval result interface is show as Fig. 2.

(1) Calculate all images color histogram to form matrix

Figure 2. One retrieve result interface.

In the experiment, we choose 30 images with cancer from

EXPERIMENTAL RESULTS AND DISCUSSIONS

In the experiment, we adopt 1345 gastroscope images, in

When retrieval using SVD base on color histogram after

RETRIEVAL RESULTS OF COLOR HISTOGRAM USING LSI WITH DIFFERENT k

RETRIEVAL RESULTS OF COLOR AUTOCORRELOGRAM USING LSI WITH DIFFERENT k

As to color histogram using LSI, when k is 30 the average

According to Tab. and Tab. , we can get the average

Figure 3. Precision of color histogram using LSI and color

We can observe from Tab. , Tab. and Figure 3 that

All experiments results are show as Tab. and Fig. 4:

STATISTICS OF EXPERIMENT RESULTS

connected voluntarily together. As to gastroscopic images, by

col or aut ocor r el ogr am

Nor mal i zed

Nor mal i zed, Wei ght ed

This paper in allusion to gastroscopic images, make use of

Nor mal i zed, Wei ght ed wi t h

Figure 4. Precision contrast of retrieval results.

We can observe from Tab. and Fig. 4 that as to color

Anda mungkin juga menyukai