Anda di halaman 1dari 5

SICE Annual Conference 2013

September 14-17, 2013, Nagoya University, Nagoya, Japan

Hand Written Character Recognition using


Star-Layered Histogram Features
Stephen Karungaru, Kenji Terada and Minoru Fukumi
Dept of Information Science and Intelligent System,
University of Tokushima,
2-1 Minami Josanjima, Tokushima, Japan.
(Tel: +81-88-656-7488; Email: karunga@is.tokushima-u.ac.jp)
Abstract: In this paper, we present a character recognition method using features extracted from a star layered histogram
and trained using neural networks. After several image preprocessing steps, the character region is extracted. Its contour
is then used to determine the center of gravity (COG). This CoG point is used as the origin to create a histogram using
equally spaced lines extending from the CoG to the contour. The first point the line touches the character represents
the first layer of the histogram. If the line extension has not reached the region boundary, the next hit represents the
second layer of the histogram. This process is repeated until the line touches the boundary of the characters region. After
normalization, these features are used to train a neural network. This method achieves an accuracy of about 93% using
the MNIST database of handwritten digits.
Keywords: Star Layered Histogram, Character Recognition, Neural Networks

1. INTRODUCTION

In this work, we present a character recognition


method using features extracted from a star layered histogram and trained using neural networks. After several
image preprocessing steps, the character region is extracted. Its contour is then used to determine the center
of gravity (COG). This CoG point is used as the origin to
create a histogram with equally spaced lines extending to
the contour. The first hit along the line represents the first
layer of the histogram. The effectiveness of various types
of neural networks to solve a variety of problems has
recently been shown in [9] for partially connected neural networks (PCNN), [10] for recurrent neural networks
(RNN) and [11] for perceptron neural networks. This
adds confidence to the use of neural network in learning
problems. Character recognition using neural networks
to determine a threshold is proposed by [12].

Communication using automatic verbal to text conversion has gained a lot of attention lately especially in
mobile devices. However, hand written character recognition remains a vital research area mainly because of
its application to human-machine and machine-machine
communication. Hand written character recognition still
remains a challenge even though great improvements
have been achieved using digital pens and touch screens.
Another area in character recognition is in virtual scenes.
In such scenes, the characters are written in the air by
hand and captured using a cheap USB camera placed
in front of a subject. Such characters are termed Air
Characters in this work. Fortunately, many useful technologies in automatic detection and recognition have already been proposed to recognize characters. Furthermore, recognition of air characters will open new areas in
human-machine interfaces especially in replacing the TV
remote control devices and enabling non-verbal communication. Three steps are necessary in such systems. That
is, the size and orientation invariant segmentation of the
characters, normalization of other factors like brightness,
contrast, illumination etc. and the recognition of the characters themselves. Today there are many OCR devices
in use based on a plethora of different algorithms [1].
Examples include a wavelet transform based method for
extracting license plates from cluttered images achieving
a 92.4% accuracy [2] and a morphology-based method
for detecting license plates from cluttered images with a
detection accuracy of 98% [3] . Hough transform combined with other preprocessing methods is used by [4]
[5]. In [6] an efficient object detection method is proposed. More recently, license plate recognition from lowquality videos using morphological and Ada boost algorithm was proposed by [7]. It uses the haar like features
proposed by [8] for face detection.

1.1 Star layered histogram feature extraction algorithm


The idea is to extract robust shape features of the characters and learn them using a neural network. Many
shape analysis methods use polygons to define the shape
of an object. The polygon captures the outside shape of
the object and the vertices are used as features. There are
two problems with this approach.
1. You cannot control the number of vertices per object;
therefore learning with fixed input algorithms like neural
networks becomes difficult.
2. The polygons captures only the outer shape of the objects. This is only useful for block objects without inside
contours. Hand written characters include many inner details that cannot be captured using the outer polygons.
In this work we propose a star layered histogram
method to capture all the shape characteristics of the characters and use the features for neural network learning.
The details are explained below.
1151

PR0001/13/0000-1151 400 2013

SICE Annual Conference 2013


September 14-17, 2013, Nagoya, Japan

1.2 Star Shaped

1.3 Layered histogram

Several image pre-processing methods should be done


before extraction of the star histogram features. These
include (explained later in the paper) determining the
smallest rectangle enclosing the character and determining its center of gravity.
Using this information, in its most basic form, the extraction of the star features consists of eight lines that can
be extended though the center of gravity to the four vertices of the rectangle and the centers of both the width
and the height. The eight lines form the basis of the star
method. This is shown in Fig. 1.

A feature vector extracted as detailed above works


well on some numerals like zero or one. As shown by Fig.
3, this method fails for most other numerals. The problem
is the exact opposite of the polygon method. That is, only
the inner shape of a character is likely to be extracted.

Fig. 3 If the line tracing is allowed to extend until the


edges of the rectangle, then more than one feature per
line is possible. The blue dots show the first feature
and the green dots the second feature. If no green dots
are found on a line, that feature is zero.

Fig. 1 Basic star with object showing the points at which


the lines touch the object (blue points).

The solution to this problem is, instead of terminating


the line extension at the first point it touches the character, the line extension should continue until it touches the
enclosing rectangle edges. Along this path, every time it
touches the character, the length value should be noted.
Therefore, the possibility arises where one line could be
represented by more than one value. This is the basis
of the layered histogram. Unlike the normal histogram,
each line segment is allowed to take more than one value,
Fig. 4. Therefore, the number of features per character
depends on the maximum number of values used to represent a line.

Note that the star will consist of four straight lines if


the center of gravity of the rectangle is the same as that of
a character. It the centers of gravities are different, then,
eight lines are required. Assuming that a character is enclosed in the rectangle, the star is created by extending
every line along the set axis and determining its length
(measured from the COG point) when it touches the character and expressed in a histogram, Fig. 2. Therefore, a
minimum of eight features per character can be obtained
each equal to the length of the line segment. At this stage,
When the line touches the character, the line extension
process is terminated. If the line extends all the way to the
rectangle without touching the character, then its length
is set to zero.

1.4 Image re-sampling


To capture all the points along the line accurately, the
image must be re-sampled several times. Changing the
image size enables the detection of feature points that
could have been missed at larger sizes. The final points
are then determined after the re-sampling process. The
re-sampling is based on the size of the image. In this
work, it is done twice by reducing the image size by about
10% each time.

Fig. 2 Basic histogram showing the 8 features represented by their normalized lengths.
The final feature data will consist of the character identifier, the number of features and the eight feature data.
By dividing that data by the longest line segment, data
normalization can be achieved. Data normalization deals
with characters size differences. Each character can then
be represented by a ten value feature vector that can be
visualized as a normal histogram, Fig. 2.

Fig. 4 Layered Histogram. Each of the basic 8 lines can


take more than one feature. If only one feature exists,
the others are set to zero.
1152

SICE Annual Conference 2013


September 14-17, 2013, Nagoya, Japan

2. PRE-PROCESSING

It has at least two and at most six neighbors


Left, top and bottom neighbors exist
Right, top and bottom neighbors exist
The second iteration phase is similar to the first except
the order of the last two processes is reversed. The algorithm terminates if no more pixels can be deleted after
the two iterations. Fig. 6 shows example results achieved
using the Zhang-Suen Thinning Algorithm.

2.1 Image Binalization


The Discriminant Analysis (DA) method is used to automatically determine the threshold used to binalize the
image. The DA method classifies data into two classes
solving for eigenvectors which maximize between-class
variance and minimizes the within-class variance. The
algorithm proceeds as follows [13]. For a given image,
where i , Mi , i2 and MT represents the total number
of pixels in class, class average brightness, class variance
and overall average brightness respectively, the within2
) is given by:
class variance (W
2
W
=

1 12 + 2 22
1 + 2

(1)
Fig. 6 Thinning results using the Zhang-Suen Thinning
Algorithm.

2
) can be calculated
and the between-class variance (B
using:
2
B
=

1 2 (M1 M2 )2
(1 + 2 )2

(2)

However, the results were not perfect for all characters. Therefore, a pruning algorithm is necessary to remove such noise.

(3)

3. NEURAL NETWORKS

The total variance is given by:


T2

2
2
= W
+ B

The threshold can be determined by maximizing


the following equation.
2
B
2
= 2 B 2
2
W
T B

2
B

in

In this work, we chose Neural Networks (NN) as


the main classifiers because of their proven effectiveness
to learn multi-dimensional and non-linear data [9] [11].
There are 10 numerals that must be recognized.
Although neural networks can learn from large nonlinear data, the process requires thousands of training examples and huge computation time. Therefore, to effectively use neural networks in real time, their structure
should be simple and the number of classes to be learned
should be minimized. In this work, the input layer can
have 8, 16 or 24 inputs depending on whether a one, two
or three layered histogram features are used respectively.
There are 10 outputs representing the numerals. The hidden layer has between 100 and 150 nodes. The neural
networks are selected to be 3 layered trained using the
back propagation algorithm [15]. This system has 10 output nodes. The system is trained to produce an output of
0.95 for the node representing the numeral being learned
and 0.05 for all the other output nodes. To further reduce
the size of these neural networks, improving the training
and test speeds, structural learning with knowledge [16]
is used to supplement the error back propagation method.
Moreover, a 5-fold cross validation method is used
during the training of the neural network.

(4)

Note that this method can automatically determine a


good threshold in any image. A selected result is shown
in Fig. 5

Fig. 5 Results of binarization using the Discriminant


Analysis method.
2.2 Image Thinning
An image skeleton is useful because it represents the
shape of the object in a relatively small number of pixels (Parker, 1994). This reduction of information speeds
up the other analysis or recognition processes performed
after. Thinning is an iterative technique, which extracts
the skeleton of an object as a result. For every iteration, the edge pixels having more than one adjacent background pixels are eroded if their removal does not change
the topology of the character. We employ the ZhangSuen Thinning Algorithm [14] because it is fast and easy
to process. This skeletonization algorithm is a parallel
method that obtains a new value depending only on the
previous iterations value. The algorithm can be implemented using two iterations. In the first iteration, a pixel
is deleted (in order) if

4. EXPERIMENTS
4.1 Database
The MNIST database [15] of handwritten digits, used
in this work, has a training set of 60,000 examples, and
a test set of 10,000 examples. The digits have been
size-normalized and centered in a fixed-size image. The
MNIST database was constructed from NISTs Special
Database 3 and Special Database 1 which contain binary
images of handwritten digits. NIST originally designated
1153

SICE Annual Conference 2013


September 14-17, 2013, Nagoya, Japan

Table 2 Two-Layered Histogram model results

SD-3 as their training set and SD-1 as their test set. SD1 contains 58,527 digit images written by 500 different
writers. The original black and white (bilevel) images
from NIST were size normalized to fit in a 20x20 pixel
box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing
technique used by the normalization algorithm.
The neural network structure developed for this work
is a four layer network. The first layer nodes are 8, 16
and 24 representing the normal, two-layered and threelayered cases respectively. The two hidden layers nodes
were fixed at 100 nodes after several trials. The output
layer consists of 10 nodes each for the 10 handwritten
numerals.
From the database, 5000 samples per character are selected at random for learning. The 5000 characters are
further subdivided into 5 groups of 1000 characters each
to enable 5-fold cross validation. Moreover, the neural
network was trained for two cases; that is, feeding the
training samples in order or at random.
The final testing set consists of 10010 characters. The
final results as a percentage are calculated based on the
correct recognition out of the test character set.

Table 3 Three-Layered Histogram model results

4.2 Normal
The results in this section represent the case where the
basic star model consisting of 8 features was used. This
means that from characters the center of gravity, the first
hit is taken as the feature for the line. Therefore, the number of features per character used for training is 8. The
results are shown in Table 1.
Table 1 Normal star model results

layer neural networks are applied in about 18 cases. A 6layer network is also used. The results show an error rate
of between 0.35 and 4.7% for 6-layered and 2-layered
networks respectively. The proposed system produces an
error rate of 2.7% using only the star-layered features and
a 3-layered neural network. This is within the range (1.53
to 3.05%) of similar structured neural nets.

Training

Training

Validate

Testing

order

(%)

(%)

(%)

Normal

97.0

92.2

96.2

Random

96.7

92.8

96.9

Training

Training

Validate

Testing

order

(%)

(%)

(%)

Normal

98.0

93.7

96.6

Random

98.8

94.1

97.3

Training

Training

Validate

Testing

order

(%)

(%)

(%)

Normal

95.0

91.2

93.4

Random

96.1

92.5

93.5

4.6 Discussions
There are many carried over errors in the preprocesses
in this work that affect the final accuracy. These include
binarization and thinning. A universal threshold determined by the discriminant method is used. This caused
some breaks in the numerals. We must consider using
the variable threshold method in the future to solve this
problem. In each window the threshold will still be determined using the the discriminant method. The thinning
algorithm used requires pruning. The thinned image directly determines the value of the features. Therefore, a
more accurate method should be applied in the future.
Re-sampling the image enables for the capture of displaced features due to alignment. Note that we are
dealing with a binary image and extraction features on
straight line. Therefore, it is possible for some pixels to
be off the line at some sampling rate. The two-layered
model offered the best result because most of the features
were accurately captured. It turns out that re-sampling 3
times deletes some of these useful features. We must find
the optimum sampling rate in the future.

4.3 Two-Layered Histogram


In this case, each line segment was extend until at lease
two hits were produced or the edge of the character was
reached. Therefore, 16 features per character were available for training. The results are shown in Table 2.
4.4 Three-Layered Histogram
In the three-layered model, each line segment was extend until at lease three hits were produced or the edge
of the character was reached. Therefore, 24 features per
character were available for training. The results are
shown in Table 3.

5. CONCLUSION

4.5 Comparison to other methods


Many other methods have been tested with this training set and test set [17]. Of these methods, 1, 2 and 3-

In this paper, we presented a character recognition


method using features extracted from a star layered histogram and trained using neural networks. After the char1154

SICE Annual Conference 2013


September 14-17, 2013, Nagoya, Japan

acter region is extracted, its contour is then used to determine the center of gravity (COG) that is used as the origin
to create a histogram using equally spaced lines extending from it. The first point the line touches the character
represents the first layer of the histogram. If the line extension has not reached the region boundary, the next hit
represents the second layer of the histogram. This process is repeated until the line touches the boundary of the
characters region. After normalization, these features are
used to train a neural network to evaluate their effectiveness in numeral classification. This method achieves an
accuracy of about 97.1% using the MNIST database of
handwritten digits.
In future, we must analyze the method to determine the
most effective way of representing the features because
most of them are zero especially in the 24 feature vector.
Moreover, other types of features including the bifurcation points, area, edge gradient, etc. must be considered
to improve the recognition accuracy. Other classification
method and databases also need to be considered in the
future.

Computing, Information and Control, vol.3, no.4,


pp.919-935, 2007.
[10] Fekih, A., H. Xu and F. Chowdhury, Neural networks based system identification techniques for
model based fault detection of nonlinear systems, International Journal Innovative Computing, Information and Control, vol.3, no.5, pp.1073-1085, 2007.
[11] L. Mi and F. Takeda, Analysis on the robustness
of the pressure-based individual identification system
based on neural networks, International Journal Innovative Computing, Information and Control, vol.3,
no.1, pp.97-110, 2007.
[12] M.Fukumi, Y.Takeuchi H.Fukumoto, Y.Mitsukura
and M.Khalid, Neural Network Based Threshold Determination for Malaysia License Plate Character
Recognition, Proc. of 9th International Conference
on Mechatronics Technology, Vol.1, No.T1-4, pp.15, Kuala Lumpur, 2005.
[13] Tamura Hideyuki, Computer image processing,
pp.140,Ohmsha Co., Ltd, 2002.
[14] Zhang, T. Y. and Suen, Ching Y., A Fast Parallel
Algorithms For Thinning Digital Patterns, Communication of the ACM, Vol 27, No. 3, Maret 1984, pp
236-239, 1984.
[15] Kah-Kay Sung, Learning and example selection for
object and pattern recognition, PhD Thesis, MIT AI
Lab, 1996.
[16] M. Ishikawa, Structure learning with forgetting,
Neural networks journal, Vol. 9, No. 3,pp 509-521,
1993.
[17] Yann LeCun and Corinna Cortes, THE MNIST
DATABASE of handwritten digits, available at
http://yann.lecun.com/exdb/mnist/, 2013.

REFERENCES
[1] Eric
W.
Brown,
Character
Recognition
by
Feature
Point
Extraction,
http://www.ccs.neu.edu/home/feneric/charrec.html,
2010.
[2] Ching-Tang Hsieh, Yu-Shan Juan and Kuo-Ming
Hung, Multiple License Plate Detection for Complex
Background, Advanced Information Networking and
Applications, pp.389-392, 2005.
[3] Jun-Wei Hsieh, Shih-Hao Yu, Yung-Sheng Chen,
Morphology-Based License Plate Detection from
Complex Scenes, Proc. of International Conference
on Pattern Recognition, pp. 176-179, 2002.
[4] Yanamura Y., Goto M., Nishiyama D., Soga M.,
Nakatani H. and Saji H, Extraction And Tracking
Of The License Plate Using Hough Transform And
Voted Block Matching, Proc. of IEEE IV Intelligent
Vehicles Symposium , pp.243-6,2003.
[5] Kamat V. and Ganesan S, An efficient implementation of the Hough transform for detecting vehicle license plates using DSPfS, Proc. of Real-Time Technology and Applications Symposium, pp.58-9,1995.
[6] Viola P. and Jones M, Rapid Object Detection Using
a Boosted Cascade of Simple Features, Proc. of Computer Vision and Pattern Recognition, vol.1, pp.511518, 2001.
[7] Chih-Chiang Chen and Jun-Wei Hsieh, License Plate
Recognition from Low-Quality Videos, Proc. of the
IAPR Conference on Machine Vision Applications,
pp. 122-125, 2007.
[8] P. Viola and M. J. Jones, Robust real-time face detection, International Journal of Computer Vision, vol.
57, no. 2, pp. 137-154, 2004.
[9] Y. Abe, M. Konishi and J. Imai, Neural network
based diagnosis system for looper height controller
of hot strip mills, International Journal Innovative
1155