I.
I NTRODUCTION
Completely Automated Public Turing test to tell Computers and Humans Apart, (CAPTCHA) [1] [2], is wisely
employed in online procedures on website applications, such
as Googles or Yahoos free emails, to promise that a user
is human and not a automatic program as shown in Fig. 1.
The basic idea of CAPTCHA is to resist exertion of decoding
by segmentation and character recognition algorithms or to
provide permission levels for search engines [3]. Various users
try to apply an automatic program to sign up for thousands of
an email account every minutes in order to break or to agitate
systems. After successfulness, they achieve some permission
on the systems such as ability to send out thousand mails in
a certain period. Thus, investigation of CAPTCHAs defect
helps toward to improve robustness and security in order
to against attacks from automatic programs. The preliminary
version of CAPTCHA arranges incoherently a word image
from a dictionary and then includes distorted and noise image
background. Afterwards, a user is asked to decode and to
verify the word appearing in the image. With the given words
deformation, most humans success at this test, while optical
character recognition (OCR) programs fail the test. Currently,
the existing OCR programs have been developed based on
articial intelligence(AI) scheme able to screen and to generate
a set of possible words from the examination [4]. Therefore,
the technique of random a word from storage becomes simple
for breaking. Random character, graphic image, or audio
CAPTCHAs become possible solutions to increase robustness
and security for protection systems [5]. In this paper, the
random character which is text-based schemes is focused, as
commercial applications utilize it to anti malicious programs.
978-1-4799-3855-1/14/$31.00 2014 IEEE
(a)
(b)
II.
S TATE OF THE A RT
Sound-based is realized based on the auditory perception of human to identify words or letters in a sound
clip with distorted and adding noise background. A
typical sound-based CAPTCHA is reCAPTCHA [7].
The combination of audio and a visual image was
introduced by Graig Sauer et al. [8] [9]. However, the
audio-based CAPTCHAs are achieved with accuracy
up to 71% [10] from popular Web sites and difcult
and time-consuming [11].
for smart-phone and tablet PC was introduced by Mitsuo Okada et al. [15] to avoid bypassing verication
of OCR programs. The technique used multiple noise
images, where invisible objects or texts were hidden
within a certain area. The design based on color, usability, and security was examined by A. E. Ahmad et
al. [16]. The CAPTCHA image utilizes a simple color
scheme to increase its usability to avoid the potentially
complicated consequences of usability and security.
J. S. Lee and M. H. Hsieh [17] proposed an effective
method to help conrming the authentication process
within a network system.
1) Colour-to-Gray Conversion: the step converts a redgreen-blue(RGB) image or a color-map image to a grayscale image. Matching the luminance of a RGB image seems
signicant. Then, there are several methods for conversion
and matching such as maximum method, average method and
weight average method etc. For this computation, the weight
average method is taken into account where the constant
luminance weighted scales of the R, G, and B components are
constant at 0.2989, 0.5870, and 0.1140, respectively. A gray
value (GV ) is computed by Equ. (1).
Decoding and resolving schemes: this issue considers actually on two sub-domains.
(1)
(4)
=
=
r
r ,
(5)
(a)
(b)
(a)
Step 4
(b)
Algorithm 1 Y = TMC(I,T,C1)
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
Step 1
Step 2
Step 3
IV.
(a)
(b)
(a)
(b)
0.8
10
0.6
20
15
25
0.4
30
Human
35
0.2
40
45
0
4
0.9
Normalized readout
Normalized recognition
8
10
12
The number of characters
14
0.4
ArialBlack
ArialRoundedMT
BodoniMT
Calibri
Forte
Human
Percentage of test
(TMC)
92%
80%
33%
10
15
20
Font size
25
30
(a)
1
Normalized recognition
0.6
50
#Word
Recognition
1
2
3
0.7
0.5
0.8
0.8
ArialBlack
ArialRoundedMT
BodoniMT
Calibri
Forte
0.6
0.4
Human
0.2
0
10
15
20
Font size
25
30
(b)
Normalized recognition
Calibri
Arial black
Bodoni MT black
Cooper black
0.8
0.6
0.4
Human
0.2
0
0
0.2
0.4
0.6
Normalized Noise
0.8
D. Robustness comparison
The Gimpy-based image, word recognition in the presence
of clutter [19], is obtained for comparison. We design our
a text-based CAPTCHA image for testing with the noise
background at 0.35 and at font size 16. Our task in to identify a
word in the cluttered image by randomly image 200 instances
as shown in Fig. 9.
V.
C ONCLUSION
This paper proposes the analysis and investigation of textbased CAPTCHA images robustness with the several degrees
of noise, skew, font type and size based on the Template
Matching Correlation (TMC) technique. The method, algorithm and implementation are introduced. From the simulation
results based on the acceptable robustness and the human
reading ability (normalized recognition) in the range of 0.1
to 0.6, the designed font skew is between 10 and 15 . In
addition, suggestion of the font size and the font type for the
robust design of the text-based CAPTCHA images is that the
font size should greater than 6 and the font type should be
Calibri. For the noise background investigation, the included
noise is in the range of 0.3 to 0.4 of the normalized noise
for maintaining the level of security, where human is able to
recognize but hard to be decoded by automatic programs.
VI.
F UTURE WORKS
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13] H. Gao, D. Yao, H. Liu, X. Liu, and L. Wang, A novel image based
captcha using jigsaw puzzle, in IEEE 13th International Conference
on Computational Science and Engineering (CSE), 2010, pp. 351356.
[14] A. Hindle, M. W. Godfrey, and R. C. Holt, Reverse engineering
captchas, in Reverse Engineering, 2008, pp. 5968.
[15] M. Okada and S. Matsuyama, New captcha for smartphones and tablet
pc, in IEEE Consumer Communications and Networking Conference
(CCNC), 2012, pp. 3435.
[16] A. S. E. Ahmad, J. Yan, and W.-Y. Ng, Captcha design: Color,
usability, and security, IEEE Internet Computing, vol. 16, pp. 4451,
2012.
[17] J.-S. Lee and M.-H. Hsieh, Preserving user-participation for insecure
network communications with captcha and visual secret sharing technique, IET Networks, vol. 2, no. 2, 2013.
[18] A. A.Chandavale, A. M. Sapkal, and R. M. Jalnekar, Algorithm
to break visual captcha, in Emerging Trends in Engineering and
Technology (ICETET), 2009 2nd International Conference on, 2009,
pp. 258262.
[19] G. Mori and J. Malik, Recognizing objects in adversarial clutter:
breaking a visual captcha, in CVPR 03 Proceedings of the 2003
IEEE computer society conference on Computer vision and pattern
recognition, 2003, pp. 143141.
[20] J. Zhang and X. Wang, Breaking internet banking captcha based on
instance learning, in Computational Intelligence and Design (ISCID),
2010 International Symposium on, 2010, pp. 3943.
[21] J. Yan and A. S. E. Ahmad, Captcha robustness: A security engineering
perspective, Computer, vol. 44, pp. 5460, 2011.