dataset
Abdul Monem S. Rahma#1, Ali Adel Saeid#2, Muhsen J. Abdul Hussien#3
#
Department of computer science, University of Technology
Iraq- Baghdad
1
110003@uotechnology.edu.iq
Abstract— Cuneiform symbols represent a complex problem in Leonard Rothacker [4] proposed a new approach for
pattern recognition, in particular for OCR (optical character identifying cuneiform symbols by considering each
recognition) due to challenges related to cuneiform-like cuneiform symbol with statistical metrics (e.g., Bag-of-
character distortion and font heterogeneity. This paper Features and Hidden Markov) and the use of SIFT
proposes new approaches to recognise Assyrian cuneiform
descriptors. For documenting and presenting cuneiform
characters using OCR to classify the symbols. as a new way to
recognize the Assyrian letters by dealing with symbols of image tablets, Jonathan Cohen [5] adopted an internet
complex character. The dataset utilised consists of 16 patterns deployment platform that supports searchers from a digital
to reflect all probabilities associated with each cuneiform archive comprised of images of cuneiform tablets with
symbol related to their shape and directions, assuming each different 2D views based on applying a scanning technique
character consists of a set of symbols. Polygon approximation supported with Java programming tools.
techniques are used to generate feature vectors for the This paper proposes a new method to recognise
classification tasks. The proposed method obtains classification cuneiform Assyrian characters using OCR (optical character
ratios up to 91% depending on the algorithm used for the recognition) techniques. The process analyses segmented
feature vector.
symbols of a cuneiform character to determine their quantity
and directions (horizontal, vertical, or diagonal) to be used
Keywords— Cuneiform, polygon approximation, dataset, as classification characteristics based on a generated feature
pattern recognition, OCR. vector containing appropriate boundary features. A polygon
approximation technique creates the feature vector.
I. INTRODUCTION
Cuneiform writing is one of the oldest written systems II. ASSYRIAN CUNEIFORM LANGUAGE
invented in the land of Mesopotamia during the third The Assyrian cuneiform language represents one stage in
millennium BC around 3200 BC [1]. The beginning of this the development of cuneiform writing in Mesopotamia,
system depended on a collection of symbols depicting which continued from the beginning of the first millennium
images of things and appeared in the ancient Sumerian BC to 600 BC. Its method relies on drilling symbols on clay
language. This language underwent stages of evolution that or stone tablets from left to right to form groups that reflect
transformed the symbols into cuneiform patterns used in basic language meanings. The cuneiform language includes
Babylonian and Assyrian languages. The cuneiform system a set of about 600 letters, each of which consists of one or
differs from the hieroglyphic visual language as it is more a more symbols. These symbols or wedges are organised in
vocal and expressive language and is formed in different either horizontal, vertical, or diagonal directions [1]. The
terms to express certain meanings. Around 100,000 letters and their corresponding symbols vary from one
cuneiform tablets were discovered, which are now located character to another, such as the number of symbols, their
in museums around the world [2], especially the Iraqi direction, and their location, as seen in Figure (1).
Museum in Baghdad, which contains nearly 20,000 Many challenges associated with cuneiform writing
cuneiform tablets representing different civilisations. contribute to obstacles in the processes of analysis and
Because of the small number of translations that deal with recognition. One issue is related to the distortion of
the cuneiform language, it is necessary to use information characters and heterogeneity of fonts and patterns, as
technology, especially those areas dealing with the compared in Figure (2). Another complication results from
interpretation of patterns and symbols, to solve the problem shadows of symbols that may change from one image to
of translation. Therefore, the field was opened for another (of the same character) due to varying angles of
researchers to adopt different concepts and approaches to reflected light due to the three-dimensional geometry of the
achieve efficient translations. From a recognition approach, cuneiform symbol [6].
Hilal Yousif [1] adopted a density curve of cuneiform
symbols to create feature vectors to classify symbols using
a KNN classifier. Fahimeh Mostofi [3] suggested a
character recognition system for Old Persian cuneiform
based on a neural network mythology for classifying tasks.
1. pre-processing
2. segmentation
3. feature extraction
4. classification
5. post-processing
A. Pre-processing
This first step consists of sequential processes to deal
Fig. (1). Models of Assyrian cuneiform writings. with the raw image data. The aim is to remove noise in the
image and enhance the image’s data with acceding to
requisite efficiency to support the subsequent stages. Pre-
processing utilised in this paper follows the steps:
1. Image enhancement: Remove noise from the image,
which was created from a scan or photograph. The
median filter is used,
Y(n)= med[X(n - k),..., X(n),..., X(n + k)] …(1)
(a) (b) (c) where Y(n) is the output image and [X(n - k),..., X(n
+ k)] are the ranked pixels values in a specific
window size.
2. Image binarization: Convert the grey level intensity
colours to a binary image with only two colour tones,
black and white, representing background and
foreground regions. This paper adopts Otsu’s [9,10]
(d) (e) method for global binarisation to segment the
Fig. (2). Different fonts may be distinguished between the first (a-c)
cuneiform images, which depends on selecting
and second letters (d-e). threshold values as a minimum sum of weighted
variants between the background, b, and foreground,
Figure (3) illustrates how various locations of dark areas from
f. Starting with separate image density colours in two
one image to another can depend on the angle of illuminating light.
This issue affects the character recognition analysis and is evident
intervals, dark and light, or V1 and V2 respectively,
when two images of the same character are subject to segmented the initial intensity colour V0={0, 1,.. v} and the
processes and generate different results. second is V1={v,v+1,..,l-1,l}, the threshold value is
then calculated according to the following formulas
for each interval:
where
wb(v)=σ௩ୀଵ ሺ݅ሻ……………………………. (3)
Fig. (3). The effect of the direction of reflected light. μb(v)= σ௩ୀଵ ݅ כሺ݅ሻ/wb(v)………………….. (5)
where
n: number of features vectors in the dataset,
li: features vectors in the dataset,
t: tested features vector.
TABLE I
COMPARISON BETWEEN THE PROPOSED FEATURE VECTOR Fig. (13). The diagram for classifying the dataset patterns.
ALGORITHMS. AND OTHER ALGORITHMS
REFERENCES