Anda di halaman 1dari 4

A Modified Approach to Thinning of Devanagri Characters

Ms. Aarti Desai

Dr. Latesh Malik

Department of Computer Science & Engg.


G. H. Raisoni College Of Engineering
Nagpur, India
aartikarandikar@gmail.com

Department of Computer Science & Engg.


G.H. Raisoni College of Engineering
Nagpur, India.
lgmalik@rediffmail.com
Nepali. Devnagari characters are joined by a horizontal bar
(Shirorekha) that creates an imaginary line by which
Devnagari text is suspended. A single or double vertical line
called a Danda (Spine) was traditionally used to indicate the
end of phrase or sentence. Figure 1 below shows the basic
and special character set of Devnagari script.

AbstractIn this paper we present a modified algorithm


for Devanagri character recognition. Devnagari
characters are difficult to thin as compared to English
characters due to presence of loops and conjuncts. The
proposed algorithm modifies the thinning algorithm by
taking special requirements of the Devnagari script into
consideration. The character is first binarized and then
crudely thinned. The noise is then removed. We have
ensured that the basic characteristics of the image are
maintained while keeping the algorithm as simple as
possible. The performance of the algorithm is 94.56% .
Keywords-Character recognition , thinning , Devnagari
characters.

I.

INTRODUCTION

Character recognition, also known as OCR (Optical


Character Recognition), is an important subset within the
pattern recognition area. Offline character recognition has
achieved a great attention for many years due to its
contribution in the digital library evolution. Thinning is an
important preprocessing step in the correct working of any
OCR system. The main objective of thinning in OCR is to
reduce data storage while at the same time retaining the
topographical properties of the character. Thinning reduces
the amount of storage by converting the binary image into a
skeleton or a line drawing.
Among all types of characters, the thinning of Devnagari
character is one of the most complicated tasks. The
complexities in the thinning process arise due to presence of
multiple loops, conjuncts, upper and lower modifiers and the
number of disconnected and multi stroke characters in a
word. This paper proposes an effective thinning algorithm
for Devnagari characters.
II. DEVNAGARI SCRIPT
The Devnagari script is the most widely used Indian
Script. It is a moderately complex pattern. Unlike simple
juxtaposition in Roman script, a word in Devnagari script is
composed of composite characters joined by a horizontal line
at the top. The basic alphabet set of Devnagari is very large
comprising of about 13 vowels, 34 consonants and 14 matras.
The number goes up once half letter forms are also
considered. It is used as the writing system for over 28
languages including Sanskrit, Hindi, Kashmiri, Marathi and

Figure 1 : Basic and special character set of Devnagari script.

The script has its own specified composition rules for


combining vowels, consonants and modifiers. Vowels are
used to produce their own sound or they are used to modify
the sound of a consonant by attaching an appropriate
modifier in an appropriate manner with them. Figure 2 below
shows the modifiers in Devnagari script.
Modifier symbols are placed on top, bottom, left, right or
on a combination of these. The consonants may also have a
half form or shadow form. A half character is written

___________________________________
978-1-4244 -8679-3/11/$26.00 2011 IEEE

420

touching the following character resulting in a composite


character. In part, Devnagari owes its complexity to its rich
set of conjuncts.

the final skeleton. S. Ahmed et al [04] in their algorithm


obtain the thin image by making the whole image uniform at
first and then thinning it by deleting the unnecessary pixels.
[05] and [06] both provide a good comprehensive survey of
thinning methods.
IV.

PROPOSED ALGORITHM

One of the important limitations of most conventional


algorithms is that they produce disconnected thinned
character.
Moreover matra is distorted in some algorithms and
results in a deformed character. The proposed algorithm is
free from any objections described above and shows a way to
get perfect Devnagari thinned characters comfortably.
Finally the proposed algorithm is unique and generic in the
sense that it can almost perfectly Devnagari characters.
The proposed thinning algorithm works as follows : We
first binarize the image. In this the whole scanned image is
converted to two dimensional array of cells containing only
white and black colored pixels. White (0) represents
background and black (1) represents foreground. To this
binarized image, we apply the following structuring elements
( figure 5).

Figure 2 : Modifiers

Modifier symbols are placed on top, bottom, left, right or


on a combination of these. The consonants may also have a
half form or shadow form. A half character is written
touching the following character resulting in a composite
character. In part, Devanagari owes its complexity to its rich
set of conjuncts.

Figure 4 : Input image

Figure 3 : The modifier and the modified consonant.

III.

LITERATURE SURVEY

Many thinning algorithms are proposed in the literature.


But most of them give unsatisfactory results for Devnagari
script. A properly thinned image aids in segmentation and
feature extraction, which in turn are crucial factors for
character recognition. The thinning algorithms are mainly
classified in two groups : sequential thinning algorithms and
parallel thinning algorithms. The main difference between
these two is that sequential algorithms operate on one pixel
at a time while parallel algorithms operate on all pixels
simultaneously.
S. Arora et al [01] in their work have used the algorithm
proposed by [02] for thinning of Devnagari characters. The
algorithm successively deletes dark points (i.e. changing
them to white points) along the edges. The corner positions
are the special cases to be considered in the thinning
procedure. This results in some redundant pixels. To remove
this redundancy they have applied certain masks.
R. W. Zhou et als [03] algorithm uses both flag map and
bitmap simultaneously to delete unwanted boundary pixels.
They have also incorporated smoothing templates to smooth

Figure 5: Structuring elements.

Figure 6 : Binarized Image

The procedure is repeated till no further changes occur.


The output image, i.e the binarized thin image obtained after

421

applying structuring elements still contains noise as shown in


figure 6.

Figure 8 : Final thinned image.

V.

The thinning algorithm proposed in this paper is well


suited for Devnagari characters. We tested it on a database
containing nearly 150 samples collected from 10 users. The
final thinned image contains very less percentage of noise,
maintains connectivity and loop structures. The performance
of the algorithm is 94.56%. This encouraging performance
makes this algorithm a good choice for Devnagari OCR
systems. This algorithm can also be used for other Indian
languages.

Figure 7 : Thinned binarized image containing noise

We remove this noise by checking whether a black pixel


can be safely converted into a white pixel. For this we search
in the neighborhood of the black pixel i.e we find out the
number of black pixels around the black pixel under
consideration. If this number is less than 2 then it is not
eligible to be removed. But for more than 2 numbers, we
count the number of white-black color combinations around
the considered pixel. If this number is not equal to 1 then do
nothing. But if that number is 1 then following procedure is
followed : If (i,j) is considered as a black pixel (see Figure 7)
then it is converted to white :

REFERENCES
[1]

i, j

i, j+1

i+1, j-1

i+1, j

i+1, j+1

Sandhya Arora, Latesh Malik and Debotosh Bhattachrajee, A Novel


Approach For Handwritten Devnagari Recognition in IEEE

International Conference on Signal And Image Processing,


Hubli, Karnataka, Dec 7-9, 2006.

(i ) if (i,j+1) or (i+1, j) or both (i-1,j ) and (i,j-1) are white.


(ii) if (i-1,j) or (i,j-1) or both (i, j+1) and (i+1,j) are white. The
output image is as shown in figure 8.
i-1, j-1
i-1, j
i-1, j+1
i, j-1

RESULT AND CONCLUSION

[2]
[3]

[4]

Figure 7 : 3X3 Window frame of considered pixel and its surroundings 8


pixels

[5]

422

M. Tellache, M. A. Sid-Ahmed, B. Abaza, Thinning algorithms for


Arabic OCR IEEE Pac Rim 1993. pp 248-251.
RW Zhou, C Quek, GS Ng, A novel single-pass thinning algorithm
and an effective set of performance criteria Pattern Recognition
Letters 16 (1995) pp 1267-1275
S. Ahmed, M. Sharmin and Chowdhury Mofizur Rahman , A
Generic Thinning Algorithm with Better Performance Proceedings
of the 5th International Conference on Computer and Information
Technology (ICCIT 2002), Bangladesh, Dec 2002, pp. 241-246.
Louisa Lam, Seong-Whan Lee, Thinning Methodologies -A
Comprehensive Survey Ieee Transactions On Pattern Analysis And
Machine Intelligence, Vol. 14, No. 9, September 1992 , pp 869-885.

[6]

[7]
[8]

R. Bajaj, L. Dey, S. Chaudhury, Devnagari numeral


recognition by combining decision of multiple connectionist
classifier , Sadhana 27 (2002) 59-72
[10] E.R. Davies and A.P. Plummer, Thinning Algorithms: A
critique and new Methodology ,Pattern Recognition 14,
1981,53-63

[9]

B. B. Chowdhury and U.Pal, A complete Printed


Characterecognition Bangla OCR System, vol. 31(5), 1998, pp. 531549.
D. Akhter and M. M. Ali, A Fast Thinning Algorithm for Bangla
Characters, Proc.ICCIT, Dhala, Bangladesh, 1998, pp. 132-136
Rafel C. Gonzalez and Richard Woods, Digital Image Processing,
Second Edition, Pearson education, 2004.

423