Anda di halaman 1dari 10

Computers in Human Behavior 39 (2014) 177–186

Contents lists available at ScienceDirect

Computers in Human Behavior


journal homepage: www.elsevier.com/locate/comphumbeh

A calligraphic based scheme to justify Arabic text improving readability


and comprehension
Aqil M. Azmi a,⇑, Abeer Alsaiari b
a
Department of Computer Science, King Saud University, Riyadh, Saudi Arabia
b
Department of Computer Science, Taiba University, Madina, Saudi Arabia

a r t i c l e i n f o a b s t r a c t

Article history: Studies have shown a correlation between reading comprehension and the visual appearance of the
displayed text. One of the factors that affect the visual look of a text is its alignment. The purpose of this
paper is to develop and implement a sophisticated algorithm to output a properly justified Arabic text.
Keywords: Most of the tools geared for e-document have not been tailored with Arabic in mind. And so, these either
Arabic typography violate several calligraphic rules, or are a far cry from the aesthetics developed by the centuries old tra-
Font dition of Arabic calligraphy. The scheme we developed is more realistic calligraphically and more pleas-
Kashida
ing aesthetically. It is a two-step process. Lines are populated with whole words, afterwards we use
Text justification
Readability
alternate form of the letters to compress or stretch the line as needed. In the second step we use kashida
Comprehension (elongation of the connecting line between the letters) to fill in the remaining gaps. There are strict rules
which dictate which, when, and the minimum/maximum length of the kashida a word can have. We
tested our justified Arabic text on university students. The experiment revealed the participants were
able to read faster and had a better comprehension when presented with our justified text. The scheme
we devised could be extended to other languages which share the basic Arabic script, e.g. Persian, and
Urdu.
Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction (Bernard, Lida, Riley, Hackler, & Janzen, 2002). The studies have
shown that certain typographical parameters such as typeface,
Typography (from the Greek words typos impression, and margins and spacing do affect the readability of online text
graphia graphy) is the art and technique of arranging type to make (Dyson, 2004). Campbell, Marchetti, and Mewhort (1981) com-
the language visible. This arrangement involves the selection of pared the reading speed of 156 participants for right justified Eng-
typefaces, point size, width of the line, the spacing between lines lish text using three different techniques: no justification, right
(interline spacing), spacing between groups of letters, the spacing justified using fixed character spacing, and right justified using var-
between pairs of letters (kerning), and text justification. The typog- iable character spacing. Of these, the authors reported an improved
raphy of texts is of interest for two main reasons. First, the typog- reading speed for variable-spacing right justified text. However,
raphy should not interfere with the reader’s ability to understand these guidelines are for English text and they are not applicable
the text; and second, the visual appearance of the text may influ- to Arabic script due to orthographic differences (Ganayim &
ence the motivation to read. Just imagine reading this paper in cur- Ibrahim, 2013).
sive script. There are many studies in the English language that The Arabic language is home to over three hundred million peo-
have reported a strong link between reading rate, accuracy, and ple. The Arabic script is more widely used, with over a billion and
comprehension (Fuchs, Fuchs, Hosp, & Jenkins, 2001; Tan & half Muslims who read the Qur’an in its original Arabic script. And
Nicholson, 1997). just as most of the European languages uses Roman based alpha-
Early studies examined typography readability for printed mat- bet, so is the case with languages such as Persian, Urdu, Ottoman
ters, with later studies tackling the e-texts or computer displays. (Old Turkish) . . . etc. all of these use Arabic based alphabet. There
The majority of the studies were for English typographies, e.g. are 28 basic letters in the Arabic script which includes 3 long vow-
els. Most of the letters in Arabic assume four different forms
⇑ Corresponding author. Tel.: +966 011 467 6574.
depending on the context: initial, middle, final and isolated. In
addition there are short vowels (a, i, u), more commonly known
E-mail addresses: aqil@ksu.edu.sa (A.M. Azmi), abeer_alsaiari@yahoo.com
(A. Alsaiari). as diacritical markings, a total combination of 13. These markings,

http://dx.doi.org/10.1016/j.chb.2014.07.003
0747-5632/Ó 2014 Elsevier Ltd. All rights reserved.
178 A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186

which are placed either above or below the letter, are used to clar-
ify the sense and meaning of the word. For example, the Arabic
word for flag, science, or taught is (‫)ﻋﻠﻢ‬. There are two ways to
disambiguate, use diacritical markings, or through the context. Fig. 1. The four letter name ‘Muhammad’ at different degrees of stretching (left) or
compressing (right).
The former is a fool proof scheme, however, it is absent in modern
writing. This leaves us with the context, which the natives are flu-
ent at. Though there remain cases where either interpretation is to the study in Ramadan (2011). Another study which backed this
plausible (Azmi & Almajed, 2013). Consider the standalone exam- font, albeit for young children, is (Abubaker & Lu, 2012). For our
ple, (‫ )ﺃﻛﻠﺖ ﺍﻟﺨﺒﺰ‬which could either mean I ate the bread, you ate the experiment we used plain text as well as text with full diacritical
bread (masculine), you ate the bread (feminine), or the bread was ea- markings. The assessment revealed that our font using a sophisti-
ten. Clearly this is an important reason why most religious books cated text justification algorithm beats Simplified Arabic font in
are full of diacritical markings. The books for those learning Arabic both readability and comprehension tests.
as a second language are also dotted with diacritical markings. The process of typesetting languages using the Arabic script is
Few studies have examined the typography of Arabic texts more challenging than typesetting using the Roman script because
(Abubaker & Lu, 2012; Alsumait & Al-Osaimi, 2009; Ganayim & of the requirement for special needs and strict rules. Standard soft-
Ibrahim, 2013; Hemayssi, Sanchez, Moll, & Field, 2006; Ramadan, ware on different platforms provide appropriate tools to handle
2011). Ramadan (2011) presented a study evaluating college stu- Arabic and other right-to-left languages, but leaves much to be de-
dents’ readability and comprehension of Arabic e-books. In a series sired in terms of quality. The flow of traditional calligraphy is said
of three experiments involving 49, 31, and 31 participants, where to be lost in moveable-type printing and most likely in plain com-
in the first experiment the participants were asked to rate some puter typesetting (Andrew, 2008). According to Mulder (2007), the
good Arabic fonts from those found in MS Windows system for Ottomans did not use printing with moveable type until long after
reading e-passages. In the second experiment, the participants it was in use elsewhere, mostly because of reluctance to compro-
determined the best font style and font size out of those rated in mise the elegance of the text. During the last century there were
the first experiment, while in the final experiment they determined calls to ‘simplify’ the Arabic writing system, e.g. one form per letter
the best combination of page layout and background/text color and even allowing for left-to-right ordering (Széll, 2012). These
arrangement. The result revealed that Simplified Arabic font, with calls faded as new generation of smart fonts came into existence
a point size of 14pt were the best combination between different which makes it possible to mimic some of the features that one de-
Arabic font styles and sizes. Also, a single column text with sires from a calligraphic point of view (Azmi & Alsaiari, 2009; Széll,
black/white background color was the best combination for the 2012). Though Arabic typography got much better in the last two
third experiment. Interestingly, this elaborate study overlooked decades, yet, there are still some aspects that need to be addressed
two important factors: text justification, and the diacritical mark- so that the quality of Arabic typography can match as closely to the
ings; which are likely to influence the readability and the compre- quality of its calligraphy. When justifying a Latin based text, the
hension of the Arabic text. Abubaker and Lu (2012) experimented software rely on: (1) hyphenation; and (2) insertion of extra spaces
for the optimum font size and type to read from screen for students between words. On the other hand, the Arabic based typesetting
aged 10–12. The study involved two fonts (Traditional Arabic, and software stretches the words horizontally using kashida (‫)ﻛﺸﻴﺪﺓ‬.
Simplified Arabic) at four different sizes, 10, 14, 16 and 18pts. A to- Such a solution violates a basic principle of Arabic calligraphy
tal of 30 students (10 per age group) participated in the study. The where certain letters can be stretched while others can be com-
authors concluded that font size 16pt and below are inappropriate pressed (Benatia, Elyaakoubi, & Lazrek, 2006), see Fig. 1. These
for children age 10, however, font sizes 14 and 16 are readable for set of rules as defined by the calligraphers have been maintained
those aged 12 and over. They also recommended avoiding Tradi- ever since and we would like to keep them intact into the digital
tional Arabic for this age group. The study in Ganayim and age. So in developing our Arabic text justification system we aim
Ibrahim (2013) involved 210 native Arab students. The authors to develop one that conforms to the rules as put forth by the
found that multicolumn text affected the comprehension achieve- calligraphers.
ment but not the reading speed. The students reported a better The rest of the paper is structured as follows. In the next section
comprehension for single-column text than for the multicolumn. we look into the characteristics of Arabic writing. We cover related
For designing Arabic text, a reading rate of 127 words/min is to works in Section 3. In Section 4 we go over our proposed scheme.
be considered. In the same study it was reported that the interline And in Section 5 we look into our implementation and how it com-
spacing had no relevance on either factor. As we mentioned, there pares to output by other software. Section 6 covers the evaluation
are many facets to typography. Elements such as selection of type- of our text justification system. Finally we conclude and suggest
face, point size, and various spacing are fine-tuned by individual some future works in Section 7.
typesetters depending on the aesthetics or subject to some cultural
conventions. For example, in Arabic and French it is customary to
leave a space before a colon or semicolon in a sentence, while in 2. Characteristics of Arabic writing
English it is not. Text justification is one element of typography
which can be automated; the other is hyphenation, something As a language, Arabic predates Islam (Al-Azami, 2011, pp.
which is inappropriate for Arabic script. To the best of the authors’ 123–129). However, it was Islam that boasted it and gave it emi-
knowledge, there is no study on the readability and comprehen- nence because Qur’an was revealed in it. During the early period
sion of justified Arabic text. We therefore assume the study in of Islam there were no mandatory rules for Arabic script. The draw-
Campbell et al. (1981) for the English justified text holding true ing of letters did not follow any particular style or specific rules.
for Arabic. Having said so, we in this paper develop a sophisticated Since Islam prohibits the depiction of human form (aniconism),
system to justify Arabic text. A system which takes into account all this lead to Islamic art being dominated by decorative geometric
the peculiarities of the Arabic script and at the same time conforms patterns, and calligraphy. The latter was specially revered in Isla-
to the traditional Arabic calligraphy. With the help of 44 partici- mic arts as it was the primary mean for the preservation of the
pants, all university students, we evaluate our Arabic text justifica- Qur’an. While the art of calligraphy has often been dominated by
tion system using the font we developed for this purpose against men, there were famous women calligraphers who are known to
14 point Simplified Arabic font, the best combination according scribe the whole of Qur’an (Al-Munajjid, 1995; Kazan, 2010;
A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186 179

Simonowitz, 2010). Among the most notable master calligraphers


are Ibn Muqla (d. 940), and Yaqut Al-Musta’simi (d. 1298). It is be-
lieved that the Arabic script reached its zenith through the hands
of Al-Musta’simi; and as thus, the Arabic writing became distin-
guishable with a group of features and rules that we would have
to take into account in order to produce an acceptable text from
the point of view of Arabic calligrapher (Elyaakoubi & Lazrek,
2005). There are a number of characteristics and rules of Arabic
script. A good awareness of these characteristics leads to profes-
sional design of Arabic type. We will briefly go over some of them.

2.1. Cursivity
Fig. 3. Ligatures in Arabic can appear in various degrees.
Unlike Latin based writing which is made up of independent
characters, Arabic only allows cursive style. This cursivity implies
up four different forms for the same letter, see Fig. 2. A form is
selected based on the context.

2.2. Ligatures

Due to the cursive nature of writing, Arabic script is extremely


rich in ligatures. Some of the ligatures are mandatory while others
are optional and exist only for aesthetic reasons, legibility or justi-
fication (Lazrek, 2007). The ligatures can appear in various degrees,
see Fig. 3.

2.3. Diacritic dot


Fig. 4. Using dots as a measurement unit for the letter alif.
Diacritic dots are a measurement unit marked by the feather of
the used calligraphy pen (Benatia et al., 2006; Elyaakoubi & Lazrek,
2005). The semantic role of diacritic dots is that certain letters are
characterized by the presence, number and the positions of these
dots. For example the three different letters: ‫ ﺕ ;ﺏ‬and ‫ ﺙ‬all share
the same basic glyph. The diacritic dot is also used by calligraphers
as a measurement unit to regularize the dimensions and the met-
rics of glyphs, Fig. 4.

2.4. Diacritic signs


Fig. 5. The same word ktb with different diacritic signs (colored in red). (For
Diacritic signs (or short vowels) are markings added above or interpretation of the references to colour in this figure legend, the reader is referred
to the web version of this article.)
below the letters to aid in proper pronunciation of the purely
consonantal text (Fig. 5). The diacritic signs take different heights,
not only with respect to basic glyphs but also according to other
contextual elements.

2.5. Allograph
Fig. 6. Three different allographic shapes of the initial letter ba. The diacritic dots
These are different graphical form a letter can have while keep- are used to specify the height.
ing its place, i.e. initial, middle, final and isolated. Its form is depen-
dent on the neighboring letters and the presence of kashida. For
example, the initial form of the letter ba can take three different 2.6. Kashida
allograph shapes (Fig. 6) according to its left neighboring letter
(Benatia et al., 2006). Kashida is a connection between Arabic letters and it is not a
separate character but a curvilinear connection with the previous
letter. It is primarily used for: emphasis, legibility, aesthetic and
justification (Berry, 1999). As with ligatures, the kashida comes
in various degrees (stretches), see Fig. 1. The stretching of a letter
is not haphazard, but rather follows a set of strict rules which de-
fines the priorities and degrees of which a letter can be extended.
These set of rules are stored in what is known as kashida elonga-
tion matrix (Table 1) (Benatia et al., 2006). For example, if the letter
Fig. 2. The four different forms of the Arabic letter ba. The forms are selected [‫ ]ﺏ‬is followed by the letter [‫ ]ﻁ‬then it can be stretched by a single
according to the context. diacritic dot (Section 2.3) and up to 12 diacritical dots, which is the
180 A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186

Table 1
The kashida elongation matrix or the degrees of extensions. The ‘+’ indicates elongation (stretching) is highly recommended, ‘’ indicates elongation are allowed but discouraged,
while the unsigned number means elongation are allowed. Blank entries means elongation is prohibited. To reduce the size of the table, letters are grouped into classes. Letters
inside [ ] means the class of the characters that share the same skeleton, e.g. [‫ ]ﺏ‬stand for the letters: ‫ ﺏ‬،‫ ﺕ‬،‫ﺙ‬.

Previous letter Current letter


‫ﺃ‬ [‫]ﺏ‬ [‫]ﺝ‬ [‫]ﺩ‬ [‫]ﺭ‬ [‫]ﺱ‬ [‫]ﺹ‬ [‫]ﻁ‬ [‫]ﻉ‬ ‫ﻑ‬ ‫ﻕ‬ ‫ﻙ‬ ‫ﻝ‬ ‫ﻡ‬ ‫ﻥ‬ ‫ﻫـ‬ ‫ﻭ‬
[‫]ﺏ‬ 5 1 1 1 1 +1 1 1 1 1
[‫]ﺝ‬ 1 1 1 1 +1 1 1 1 1 1 1 1
[‫]ﺱ[]ﺹ[]ﻁ‬ 3 1 2 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1
[‫]ﻉ‬ 1 1 1 1 +1 1 1 1 1 1
‫ﻑﻕ‬ 1 1 1 1 1 +1 1 1 1 1
‫ﻙ‬ 3 1 1 1 1 1 1 1 1 1 1 1
‫ﻝ‬ 3 1 1 1 1 1 1
‫ﻡ‬ 1 1 1 1 1 1 1 1 1 1 1 1
‫ﻫـ‬ 1 1 1 1 1

limit any letter can be stretched by. If on the other hand we have AlQalam (Fahmy, 2006) evolved as an upgrade to ArabTeX.
[‫ ]ﺏ‬followed by the letter [‫ ]ﺱ‬then stretching is not allowed. In Hence, it inherits ArabTeX’s good features. It was mainly intended
proper Arabic calligraphy, the kashida must resemble a curvilinear for typesetting the Qur’an and traditional texts using the Arabic
segment between letters. Using a simple straight line segment for a script. Sherif and Fahmy (2007) presented ‘parameterized Arabic
kashida is esthetically unacceptable. font’ for AlQalam system, which they claim is a better way for
working out with Arabic script. To achieve an output quality close
to that of Arabic calligraphers, they tried to model the pen nib and
3. Related work the way it is used to draw curves as closely to the ideal as possible
using METAFONT. Parameterized fonts were also introduced for a
There are few works that cover Arabic typography specifically more flexible and dynamic combination of glyphs, to be used in
from the implementation point of view. Most of what we have seen forming ligatures and in drawing whole words as single entities.
is work on issues and problems with Arabic typography. Ditroff/ffortid is a system for formatting bi-directional text in
The original TeX (Knuth, 1986) paragraphing algorithm worked Arabic, Hebrew and Persian (Berry, 1999). The system is able to for-
at the paragraph level rather than lines. The idea was to model mat mixed left-to-right and right-to-left texts using fonts with iso-
each paragraph as a graph of vertices. The first node of the graph lated letters or with connecting letters and only connection
is located at the beginning of the paragraph. Then, it locates the stretching, achieved by repeating straight fixed-length baseline
second node at the point that gives an appropriate width of line, fillers.
by computing glyphs widths starting from the first node. The first There are many Arabic word processors on different platforms.
node may connect to more than one node because the spaces be- These word processors were originally built for Latin then they
tween words varied and there are more than one case to break were adapted to handle Arabic script according to the system envi-
the line. Each arrow between two vertices has a badness value ronment. Microsoft Word supports Arabic. The justification of Ara-
associated with it. The badness value is simply the difference be- bic text in Word is based on a combination of words pacing
tween the actual length of the line and the ideal length. The new variation and kashida. Word places the kashida according to a pri-
nodes are activated to be the starting point to search for new ority scheme that indicates where kashida is to be placed automat-
nodes. At the end, the paragraph is justified by following the short- ically. Usually once only between two letters in a word. It
est way in the graph. Initially, TeX was released for Roman based determines the highest priority character in the word to be
languages only. Later, Knuth and MacKay (1987) presented a work- stretched using the kashida. In case two letters have the same pri-
ing solution for including right-to-left text (for Arabic and Hebrew) ority, then the kashida is placed towards the end of the word
in the TeX family. Their proposed TeX–XeT system is an extension (Smitshuijzen, 2007). In fact, an option within Word allows it to
of TeX. Within the TeX extensions, both Omega and ArabTeX justify Arabic text using only word spacing, similar to what it does
(Lagally, 1992) have been used for Arabic and have managed to in Latin. Another notable software is Adobe ME. The way kashidas
meet some of the basic requirements. ArabTeX had to compromise are put between the text seems to be mysterious or random in the
on the issue of text justification. In practice for right-to-left text, Adobe ME software. There could be some logic behind it but it is
ArabTeX handled line breaking itself, bypassing the original TeX’s not obvious (Smitshuijzen, 2007).
working approach that was used for the Latin script (Sherif &
Fahmy, 2007).
Haralambous (2006) presented an infrastructure for typesetting 4. Our proposed scheme
in Arabic script. This infrastructure is based on four tools: the con-
cept of texteme, OpenType fonts, Omega2 modules and an ex- The development of Arabic document processing tools must be
tended version of TeX’s line-breaking graph. Texteme is an formalized to the Arabic handwriting rules. Owing to the imposed
atomic unit of text consisting of key-value pairs such as Unicode rules, Arabic script does not enjoy the same luxury that the Latin
position. Out of this infrastructure the author created extended script does when it comes to automated typesetting on the com-
TeX graph. Applying the functionality of this infrastructure at each puters. The cursivity of the Arabic script means hyphenation is
step of Arabic text processing will create an Arabic texteme con- unacceptable (Elyaakoubi & Lazrek, 2005). The Arab calligraphers
taining all the information accumulated through the Arabic text treat the composition of text in a specific way where certain letters
processing steps, as well as the initial information: Unicode char- can be stretched while others can be compressed. To justify a line,
acters, contextual forms, etc. The author claims that using ex- the calligraphers compose some ligatures to get a narrower text, or
tended TeX graph along with texteme will prove useful for the decompose some ligatures and extend certain letters to get a wider
Arabic script, however, it was not clear how it will be applicable text. This process occurs according to the available space in the
for proper text justification other than ligature substitution. line. Forming ligatures is clear in words as ‫ ﺑﺤﺮ‬becoming to
A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186 181

shrink the word if there is a lack of available space in the line. The algorithm greedily puts as many words on a line as possible. To
Stretching letter is apparent in the first letter of the word justify a line, the algorithm computes its badness value which is
‫ ﻗـــــﻄﺎﺭ‬to get a more elongated word. According to the Arabic the value resulting from subtracting the width of the current line
calligraphy rules, the stretching of letters follows the letter itself from the specific total width (which is the maximum width of a
and the preceding letter. Both of the operations, compression and line). The badness value is always positive because the program
stretching can be done in various degrees. The rules are summa- will not allow reading a line larger than the specific total width.
rized in Table 1 which is known as the Kashida elongation matrix. After computing the badness value, the algorithm starts its justify-
Our proposed text justification scheme is a two level process. At ing process until the badness value becomes zero, actually when
first we substitute the composed ligatures with alternative forms badness falls below a certain threshold we treat it as zero. The pro-
of less/more width according to the available space. Second, we cess of justification is a two-step procedure: substitution step and
repeatedly apply the kashida according to the kashida elongation kashida step.
matrix till the lines reach the appropriate width. Most of the avail-
able Arabic fonts support limited ligatures and that means they are
poorly suitable for our proposed algorithm. So before implement- 4.2.1. Substitution step
ing the algorithm we need to prepare an appropriate font. Design- After the badness is computed, the algorithm starts looking for
ing a font from scratch is time consuming process and requires a any character that has an alternative glyph of a wider width. It only
professional design artist. We decided to pick an existing font considers the alternative glyphs that have a width equal or less
and edit it by introducing some new ligatures. In this section we than the badness value. From all the alternative glyphs that have
will start with our font development and later move to the been found, it picks the one with the largest width. This ensures
algorithm. we minimize the search for alternative glyphs that can fit the bad-
ness value. After determining the best alternative glyph, the algo-
4.1. Font enhancement rithm re-computes the badness value. The process of looking for
new glyph and replacement is continued till the badness value be-
Font development is a fairly cumbersome task. So rather than comes zero or if there are no more appropriate alternative glyphs
developing a font from scrape we decided to pick one of the avail- that can fit the badness value. In the latter case, the algorithm pro-
able fonts. We picked an early version of a highly acclaimed Arabic ceeds to the next level, that is working with kashida.
Typesetting Font (ATF). This OpenType font is based on Naskh writ-
ing style and was developed by M. Sakkal for Microsoft Corporation
along with the assistance of P. Nelson and J. Hudson. The font dif- 4.2.2. Kashida step
fers slightly from the official version which was distributed with In this step, the algorithm determines the best positions to in-
Microsoft Office 2007. This prelease version of the font comes with sert the kashida. The goal is to fill in the gaps and reduce the bad-
all the OpenType tables, which is not the case with the official ness value to zero.
release. We set priorities as to which word(s) should have the kashida.
The font contains over 2100 glyphs, including contextual alter- The shorter the word, the higher priority it has to have a kashida
nates, ligatures, and language specific forms. Great care and consis- (Table 2). Four letter words have the highest priority, followed by
tency has been applied in glyphs used for many other languages, those having five letters and so on. The priority for which words
e.g. Persian, Urdu . . . that uses a variant of basic Arabic script. In should be stretched first follows the general calligraphic rule,
the development phase we reviewed all the existing ligatures kashida appears more frequently in four letter words and less in
and then added some new ligatures and glyphs. We noticed that five letter words and even lesser in six letter words. Kashida is
most of the ligatures in ATF font are confined to isolated and initial not recommended for two of three letter words with some excep-
forms and few of final forms. Some of these ligatures are able to tions in cases such as: (‫ )ﺳﺮ‬and (‫ )ﺻﺮ‬for two letter words and (‫ )ﺑﺴﻢ‬a
accept the medial forms. The importance of the medial form of three letter word.
the ligature are due to its ability to shorten the word (make it less Once we have determined which words should have the kash-
horizontal) in case of dynamic justification. Fig. 7 shows some of ida. Next we need to decide which characters(s) within the word
the new ligatures which we added into the ATF font. Our aim is should be stretched. For this we need to consult the kashida elon-
to achieve high quality Arabic typography. This is realized by the gation matrix (Table 1). The entries with positive sign have the
presence of aesthetic ligatures as well as applying the calligraphic highest priority (which we set as priority 3), this is followed by en-
rules in justifying text. Since streching the letters with different tries with no sign (priority 2). The entries with negative sign have
elongations is part of justification process, we added to the ATF the least priority, i.e. priority 1.
font a number of kashidas with different elongations to use them The process of determining the best position for kashida starts
later when justifying the text. by first looking for the word with the highest priority for having
Following the introduction of new ligatures into ATF, we need the kashida. Then it looks for the character in this word that
to inform the font when to compose these ligatures. This facility has the highest priority to be stretched by the kashida, according
is available through the OpenType feature glyph substitution. For to the kashida elongation matrix. In case there are no appropriate
this we used VOLT, a free tool by Microsoft. VOLT (Visual OpenType characters in the current word, we move to the next word with the
Layout Tool) (www.microsoft.com/typography/volt.mspx) allows highest priority and so on. When the best position for kashida has
for visual composition. For example, if we have the glyph uniFD8A been determined according to the previous scenario, the algorithm
and is followed by the glyph uniFEAA then these two glyphs are will take the badness value as a parameter and will insert the
substituted by a single glyph glyph2940. By this way we determine kashida in the determined position. The elongation of the kashida
when to compose each one of these ligatures. The font is now ready is determined by the badness value, but in no case it can exceed 12
for use once it is compiled. times its entry in the kashida elongation matrix. After the kashida
is inserted, the badness value is recomputed. We continue looking
4.2. Our algorithm for next place to insert kashida until badness value becomes zero.
This process is repeated from the top of the document till its end.
We consider a document in a word processor as simply a set of Table 3 is the pseudo code of our justification algorithm. Fig. 8
paragraphs, with each paragraph ending with a newline character. illustrates our algorithm through an example.
182 A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186

Fig. 7. Some of the new ligatures that were added into the ATF font.

Table 2 short red horizontal lines mark the incorrect places Word used
Priority levels of words for having a kashida. kashida. For example, at the end of the second line there a kashida
Word length Priority
after the letter noon which is followed by the letter ta’. This is a five
letter word. According to the rules, the preference is given for four
Four letters 4
Five letters 3
letter word over longer ones for having kashida. Our algorithm
Six letters 2 placed kashida correctly in the first four letter word in the line
Seven letters 1 (Fig. 10b). At the fourth line (Fig. 10a) we note that Word used
kashida at several places incorrectly. Fig. 11 shows the justification
of fully diacritized text. Again we note the many places Word put
5. Implementation
kashida at the wrong place. Some more samples of justified text
using the algorithm are in Fig. 12.
The algorithm cannot work by itself but it has to be integrated
with a word processor or text editor software. The editing software
will do the typical task, and our algorithm will be called to justify 6. Evaluation of our system
the text before being rendered for screen output. For our imple-
mentation we used ‘neatpad’ (www.catch22.net/tuts/neatpad), an The purpose of this experiment was to evaluate our Arabic text
open source text editor that supports Arabic. justification algorithm, a system which closely follows the rules set
Any software that renders text will be composed of three basic by master calligraphers. As we are dealing with e-text, so it is wise
components: the text editor, the font engine, and the font typeface to compare our justification system with the one used in Microsoft
(Fig. 9). In (1), the system intercepts the stream of keyboard char- Word. For the experiment we prepared four different sample pas-
acter codes sending them to the font engine; these character codes sages of comparable sizes, each approximately 170 words long.
are translated into a stream of glyphs through the font’s tables (2); These were divided into two sets. In the first set we have two pas-
the glyphs are returned back to the editor (3) for rendering on the sages written in ATF font, with the text being justified using our
screen, and on their way to the editor the justification algorithm algorithm. The ATF font was developed as part of our justification
kicks in (4). The justification of the text is based on measuring algorithm as it contains many glyphs that are necessary for the
the width of the glyph before rendering them on the screen. Since proper working of the algorithm (see Section 4). And in the other
the algorithm is implemented in the middle between the font en- set we have two passages that are written in Simplified Arabic Font
gine and the application when the stream of glyphs is returned, (SAF), and justified using the standard algorithm in MS Word.
this translates to a line wise justification of text. This means our According to the studies (Abubaker & Lu, 2012; Ramadan, 2011),
algorithm justifies text at the line level rather than at a paragraph the SAF font achieved better results for e-text readability and com-
level. The algorithm was implemented using Visual C++ v6.0. prehension. In each set we have one passage that is plain (no dia-
Next we go over some sample output of our algorithm. For com- critical markings), and the other one with full diacritical markings.
parison purposes, each sample will be rendered by the standard Following the studies in Ganayim and Ibrahim (2013) and
built-in justification in Microsoft Word and our justification algo- Ramadan (2011) all the texts were formatted in single column.
rithm. The first sample (Fig. 10) is for a plain text, while the other The objective of the experiment is to determine the performance
one (Fig. 11) is for a full vocalized text (diacritical marking). of our justification algorithm in both passages, plain and fully
In Fig. 10a we note that MS Word had an under-filled top line. diacritized, in terms of reading speed, and reading comprehension.
At times, Word resort to extra space for justification. This is against We have two Arabic font styles, ATF and SAF, along with two
the established rules. We marked these extra spaces with blue types of texts, diacritic and non-diacritic (plain) text. These are
vertical lines as seen in the second, third and the sixth lines. The considered the main independent variables in our experiment to
A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186 183

Table 3
Pseudocode of our text justification algorithm. This algorithm justifies each line
separately.

Input: a single line of text T


badness_value maximum line width  current width of text T
// substitution step
while (badness_value > 0 AND curr_character – end_of_line) do
if (we have alternative glyph for curr_character) then
D width of current alternative  width of curr_character
if (D 6 badness_value) then
replace curr_character by its alternative form
update badness_value
end if
get next character
end if
end while
// kashida step
if (badness_value – 0) then
for word_level_priority = 4    1 do
curr_word first word in T
while (curr_word – end_of_line AND badness_value > 0) do
if (priority of curr_word = word_level_priority) then
for char_level_priority = 3    1 do
curr_character next char in curr_word with
priority = char_leveLpriority
if (exist such a character) then
kashida_length min(12 * size of diacritical dot, badness_value)
add kashida after curr_character
update badness_value
end if
end for
end if
get next word
end while
end for
end if

determine the performance with regard to participant’s reading


speed and comprehension. Participant’s reading speed and reading
comprehension are the two dependent measures collected in this
experiment to evaluate their performance. The two measures were
calculated as follows: (1) reading speed was calculated by dividing
the total number of words in each passage by the total elapsed
time in reading it, expressed in words/min; and (2) reading com- Fig. 8. An example illustrating our two step justification algorithm: (a) the original
text, (b) after substitution step, and (c) after kashida step.
prehension level was measured to reflect how well the subject
understood the text material and was measured by dividing the to-
package GraphPad Prism 6 (www.graphpad.com) to test the depen-
tal number of correct answers by the total number of questions in
dent measures. As the same subject in our experiment received four
each treatment, expressed as percent of correct answers.
treatments, we performed repeated measures 2-way ANOVA to test
For the experiment we developed an application which asks the
three hypotheses, the hypothesis of the main effect of font type (ATF
subject to enter their name, this is followed by an opening screen
vs SAF), text status (plain vs diacritical marked text), and their
with instructions, and four buttons each marking a treatment. In
interaction on subject’s performance in terms of reading speed
each treatment the subject is presented with a passage to read, fol-
and reading comprehension.
lowed by some questions on the passage. The subject reads the
passage at their own pace, and once read, they are presented with
four true/false questions related to the passage read. This is used to 6.1. Reading speed
reflect how well the subject comprehended what they read. There
is no time limit for answering the question. The subject moves to Font type effect was significant at p-value <0.0001, F(1, 43)
the next treatment once they are done answering the previous’ = 43.04. Text status and the interaction of font type by text status
questions. An internal clock keeps track of the exact time the sub- were not significant. The text status was not significant, F(1, 43)
ject required to read the passage in each treatment. At the end, the = 3.11 at p-value = 0.085. Also, the interaction of font type by text
system saves the data for each subject. The data includes the sub- status was not significant, F(1, 43) = 1.09 at p-value = 0.302 which
ject’s name, time to read each the passage, and the number of cor- is greater than our confidence interval of 0.05. As shown in Table 4,
rect answers in each treatment. The passages in the treatment and Fig. 13, participants read significantly faster at plain text in ATF
were as follows. In the first treatment the passage is in ATF font font by a mean of 164 words/min. And in general, they read faster
with diacritical marked text; in the second treatment it is in ATF in ATF font for both kinds of text (plain and diacritical marked text)
font with plain text; for the third treatment it is in SAF font with when compared to SAF font.
diacritical marked text; and in the last treatment it is in SAF font
with plain text. 6.2. Reading comprehension
A total of 44 participants took part in the study. All of them native
Arabic speakers, university students, both male and female, aged Font type effect was significant at p-value = 0.0195, F(1, 43)
20–30 years. For the statistical analysis we used a commercial = 5.885. The text statis and the interaction of font type with text
184 A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186

Fig. 9. Integrating our justification algorithm with the open source Text Editor,
neatpad. Numbers inside the figure will help explain the data flow between the
components (see text).

Fig. 11. A sample of fully diacritized text justified text using (a) standard MS Word,
and (b) our algorithm. The colored marks places of incorrect rendering. (For
interpretation of the references to color in this figure legend, the reader is referred
to the web version of this article.)

Fig. 10. A sample text justified using (a) standard MS Word, and (b) neatpad using
our justification algorithm. The blue blocks and red underlines mark places where
rendering is incorrect. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)
Fig. 12. Some more samples of justified text using our algorithm.

status were not significant. Text status was not significant, F(1, 43)
= 2.77 at p-value = 0.1031. The interaction of font type by text sta- 119 words/min. We believe this difference is natural due to the
tus was not significant, F(1, 43) = 0.839 at p-value = 0.3648 which is nature of the text, its size, and the participants themselves. And
greater than our confidence interval of 0.05. As shown in Table 4, just to put things into perspective, reading speed in other studies.
and Fig. 14, participants significantly better understood text writ- A mean speed of 138 words/min (Trauzettel-Klosinski & Dietz,
ten in ATF font irrespective of being plain or with diacritical mark- 2012), while (Ganayim & Ibrahim, 2013) recommends
ing when compared with one written in SAF font. 127 words/min when designing Arabic text for students. However,
this gives confidence in our results. We can safely claim that our
6.3. Discussion text justification scheme improves the reading speed, as well as
reading comprehension. Another interesting outcome is that the
Obviously, there are always some differences in results between diacritical marking had an adverse effect on both reading speed,
studies. Our study reported participants read plain text in SAF font as well as text comprehension. This was true across both font
with a mean speed of 129 words/min, and this is slightly more than types, though statistically the result could not be confirmed. This
what was reported by Ramadan (2011), a mean speed of result is consistent with the finding in Ibrahim (2013).
A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186 185

Table 4 seconds the finding in Ibrahim (2013), we too are not comfortable
Summary of the test results using a = 0.05 (N = 44). with the results. We believe it has to do with the nature of the sam-
F ple text used in the study. Even though the natives are good at dis-
43.04* ambiguating the meaning through context, there remain cases
ATF SAF where context alone is insufficient (Azmi and Almajed, 2013). We
Reading speed
believe the majority of the text can easily be resolved through con-
F 3.11** Plain text 164.27 (7.57) 128.98 (5.47) text. This calls to move away from fully diacritized text and instead
Diacritical 158.14 (7.49) 115.43 (6.61) go for partial diacritized text, where we mark only those letters that
marked text have a profound impact on the semantic of the text. And this what
F makes this investigation more challenging.
5.885***
ATF SAF
7. Conclusion and future work
Reading comprehension
F 2.774**** Plain text 80.11 (2.4) 74.43 (4.1)
Diacritical 77.84 (2.6) 67.61 (3.3)
In this paper we presented an algorithm that justifies Arabic
marked text text using agreed on calligraphic rules. It is a two-step process.
In the first step we use glyph substitution through composing/
Entries expressed as mean value (standard error).
* decomposing of the ligatures. And in the next step we resort to
p < 0.0001.
**
p = 0.085. kashida to fill in the under filled lines. The use of kashida (stretch-
***
p = 0.0195. ing of letters) is again dictated by calligraphic rules which are
****
p = 0.1031. stored in what is known as kashida elongation matrix. We imple-
mented our algorithm and compared its output of justified text
with the standard justification as used by MS Word and found that
our algorithm yield results closer to the calligraphers’ sense of
properly justified text. Furthermore, we explored how this justifi-
cation impacted the reading speed, and comprehension of Arabic
e-text. The experiment on 44 university students revealed that
our justified text did improve their reading speed and comprehen-
sion as well.
Future studies could further improve upon the font we devel-
oped by introducing new glyphs, which should enhance the perfor-
mance of the text justification algorithm. We suggest extending
this study to cover printed material as well. Another study should
undertake the role of diacritical marking on the reading speed and
text comprehension.

Acknowledgments

The authors would like to thank the participants who made this
Fig. 13. Reading speed (words/min) with standard error of means for ATF and SAF
fonts for plain and diacritic text. work possible. We would also like to thank the anonymous review-
ers for their suggestions which helped us improve our presenta-
tion. This work was supported by a special fund in the Research
Center of the College of Computer and Information Sciences (CCIS)
at King Saud University.

References

Abubaker, A., & Lu, J. (2012). The optimum font size and type for students aged 9–12
reading Arabic characters on screen: A case study. Journal of Physics: Conference
Series, 364.
Al-Azami, M. M. (2011). The history of the Qur’anic text: From revelation to
compilation (2nd ed.). Sherwood Park, Alberta: Al-Qalam Publishing.
Al-Munajjid, S. (1995). Women’s roles in the art of Arabic calligraphy. In G. N.
Atiyeh (Ed.), The book in the Islamic world: The written word and communication
in the Middle East (pp. 141–148). Albany: State Univ. of New York Press.
Alsumait, A., & Al-Osaimi, A. (2009). Arab children’s reading preference for different
online fonts. In J.A. Jacko (Ed.), Human–computer interaction: Part IV. HCII, LNCS
5613 (pp. 3–11).
Andrew, A. (2008). Arabic calligraphy: Phishing. Kybernetes, 37(6), 729–731.
Azmi, A., & Almajed, R. S. (2013). A survey of automatic Arabic diacritization
techniques. Natural Language Engineering. http://dx.doi.org/10.1017/
Fig. 14. Reading comprehension level as a percent of correct answers for ATF and S1351324913000284.
SAF fonts for plain and diacritic text. Azmi, A., & Alsaiari, A. (2009). Arabic typography: A survey. International Journal of
Electrical and Computer Sciences, 9(10), 16–22.
Benatia, M. J. E., Elyaakoubi, M., & Lazrek, A. (2006). Arabic text justification.
We foresee one limitation in this study which requires further
TUGboat, 27(2), 137–146 (Proc. 2006 Annual Meeting).
investigation. It has to do with the impact of the diacritical marking Bernard, M. L., Lida, B., Riley, S., Hackler, T., & Janzen, K. (2002). A comparison of
on the reading speed. The recent study in Ibrahim (2013) found that popular online fonts: Which size and type is best? Usability News, 4(4),
native Arab children read faster when the text was unvowelized, for 1–11.
Berry, D. M. (1999). Stretching letter and slanted-baseline formatting for Arabic,
which the author expressed his finding as unexpected, and inconsis- Hebrew and Persian with ditroff/ffortid and dynamic PostScript fonts. Software
tent with several other relevant studies. So, though our study Practice and Experience, 29(15), 1417–1457.
186 A.M. Azmi, A. Alsaiari / Computers in Human Behavior 39 (2014) 177–186

Campbell, A. J., Marchetti, F. M., & Mewhort, D. J. K. (1981). Reading speed and text Knuth, D. E., & MacKay, P. (1987). Mixing right-to-left texts with left-to-right texts.
production – A note on right-justification techniques. Ergonomics, 24(8), TUGboat, 8(1), 15–25.
633–640. Lagally, K. (1992). ArabTeX, a system for typesetting Arabic. Technical report 6/92.
Dyson, M. C. (2004). How physical text layout affects reading from screen. Behaviour Fakultāt Informatik, Universität Stuttgart.
and Information Technology, 23, 377–393. Lazrek, A. (2007). Fonts for Arabic scientific documents. In Conference on information
Elyaakoubi, M., & Lazrek, A. (2005). Arabic scientific e-document typography. In technology, and Arabic and Islamic studies, Imam University, Riyadh, March 6–7
Proc. 5th int. conf. on human system learning (ICHSL5) (pp. 241–252). (in Arabic).
Fahmy, H. A. H. (2006). AlQalam for typesetting traditional Arabic texts. TUGboat, 27 Mulder, E. (2007). Keyboard calligraphy. Saudi Aramco World, 58(4), 34–39.
(2), 159–166 (Proc. 2006 Annual Meeting). Ramadan, M. Z. (2011). Evaluating college students’ performance of Arabic typeface
Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R. (2001). Oral reading fluency as an style, font size, page layout and foreground/background color combinations of
indicator of reading competence: A theoretical, empirical, and historical e-book materials. Journal of King Saud University – Engineering Sciences, 23,
analysis. Scientific Studies of Reading, 5, 241–258. 89–100.
Ganayim, D., & Ibrahim, R. (2013). How do typographical factors affect reading text Sherif, A. M., & Fahmy, H. A. H. (2007). Parameterized Arabic font development for
and comprehension performance in Arabic? Human Factors: The Journal of AlQalam. TUGboat, 29(1), 79–88 (Proc. XVII EuroTeX).
Human Factors and Ergonomics Society, 55(2), 323–332. Simonowitz, D. (2010). A modern master of Islamic calligraphy and her peers.
Haralambous, Y. (2006). Infrastructure for high-quality Arabic typesetting. TUGboat, Journal of Middle East Women’s Studies, 6(1), 75–102.
27(2), 1001–1009 (Proc. 2006 Annual Meeting). Smitshuijzen, E. (2007). The big kashida secret. <http://www.khtt.net/page/923/en>
Hemayssi, H., Sanchez, E., Moll, R., & Field, C. (2006). Designing an Arabic user Accessed 28.04.13.
interface: Methods and technique for bridging cultures. User Experience, 5(1), Széll, M. (2012). Westernizing Arabic: Attempts to ‘simplify’ the Arabic script.
4–9. Tipográfiai Diákkonferencia 2011/2012.
Ibrahim, R. (2013). Reading in Arabic: New evidence for the role of vowel signs. Tan, A., & Nicholson, T. (1997). Flashcard revisited: Training poor readers to read
Creative Education, 4(4), 248–253. words faster improves their comprehension of text. Journal of Educational
Kazan, H. (2010). Female calligraphers: Past & present. Traditional arts series no. VI. Psychology, 59, 276–288.
Istanbul: Cultural Co. Trauzettel-Klosinski, S., & Dietz, K. (2012). Standardized assessment of reading
Knuth, D. E. (1986). The TeXbook: Volume A of computers & typesetting. Reading, MA: performance: The new international reading speed texts IReST. Investigative
Addison Wesley. Ophthalmology and Visual Science (IOVS), 53(9), 5452–5461.

Anda mungkin juga menyukai