Anda di halaman 1dari 5

The 4th Joint International Conference on Information and Communication Technology, Electronic and Electrical Engineering (JICTEE-2014)

Analysis of Text-Based CAPTCHA Images using


Template Matching Correlation Technique
Promprawatt Sakkatos, Weeratham Theerayut, Vijitketteepragorn Nuttapol and Pongyupinpanich Surapong
Computer Engineering Department (RIEES Lab.), Faculty of Engineering,
Ramkhamhaeng University, Bangkapi, Bangkok 10240, Thailand
Email: surapong@riees.org
AbstractText-based CAPTCHA images have been widely
utilized in on-line applications to anti malicious programs which
attempt to make failure in execution or computation. Although
installing CAPTCHA enhances systems security, it has to be
continuously analysed, improved and developed for hard decoding
or extracting from intrusion of automatic programs. This paper is
mainly focused on examination of text-based CAPTCHA images
with several degrees of noise, skew, font type and size. The
Template Matching Correlation (TMC) technique consisting of
image conversion, threshold, noise rejection, segmentation and
recognition methods, is introduced for analysis. From simulation
results, the robustness is increased after the image is distorted by
noise background and font skew in the range of 0.3 to 0.4 and
10 to 15 ; however uently recognized by human.

I.

I NTRODUCTION

Completely Automated Public Turing test to tell Computers and Humans Apart, (CAPTCHA) [1] [2], is wisely
employed in online procedures on website applications, such
as Googles or Yahoos free emails, to promise that a user
is human and not a automatic program as shown in Fig. 1.
The basic idea of CAPTCHA is to resist exertion of decoding
by segmentation and character recognition algorithms or to
provide permission levels for search engines [3]. Various users
try to apply an automatic program to sign up for thousands of
an email account every minutes in order to break or to agitate
systems. After successfulness, they achieve some permission
on the systems such as ability to send out thousand mails in
a certain period. Thus, investigation of CAPTCHAs defect
helps toward to improve robustness and security in order
to against attacks from automatic programs. The preliminary
version of CAPTCHA arranges incoherently a word image
from a dictionary and then includes distorted and noise image
background. Afterwards, a user is asked to decode and to
verify the word appearing in the image. With the given words
deformation, most humans success at this test, while optical
character recognition (OCR) programs fail the test. Currently,
the existing OCR programs have been developed based on
articial intelligence(AI) scheme able to screen and to generate
a set of possible words from the examination [4]. Therefore,
the technique of random a word from storage becomes simple
for breaking. Random character, graphic image, or audio
CAPTCHAs become possible solutions to increase robustness
and security for protection systems [5]. In this paper, the
random character which is text-based schemes is focused, as
commercial applications utilize it to anti malicious programs.
978-1-4799-3855-1/14/$31.00 2014 IEEE

(a)

(b)

Fig. 1: Examples of CAPTCHA images used by (a) Googles


free mail, (b) internet banking.

II.

S TATE OF THE A RT

Most literatures ratiocinating in CAPTCHA involve with


rendering visual characters, i.e. skew, rotation, intensity of
character, multiple color background, line or curve background, etc. Few researches deals with analysis, robustness and security corresponding to decoding or breaking of
CAPTCHA [6]. Nevertheless, the two domains still challenge
for computer vision researchers, classied as two main issues.
Design and implementation: the literatures on this issue
have focused on construction, design and implementation of
text-based, graphic-based, and audio-based CAPTCHAs.

Sound-based is realized based on the auditory perception of human to identify words or letters in a sound
clip with distorted and adding noise background. A
typical sound-based CAPTCHA is reCAPTCHA [7].
The combination of audio and a visual image was
introduced by Graig Sauer et al. [8] [9]. However, the
audio-based CAPTCHAs are achieved with accuracy
up to 71% [10] from popular Web sites and difcult
and time-consuming [11].

Graphic-based utilizes the disadvantage of pattern


recognition which is difcult to execute the comparison process with graphic information for design and
implementation [12]. The jigsaw puzzle was presented
by H. Gao et al. [13]. Although their experiments
and security analysis are proved that human users
can complete the CAPTCHA verication quickly and
accurately, it is difcult to use and not favour in the
existing web applications.

Text-based is designed not only for simple to use


but also for reducing time-consuming where many
researches are playing attention on this area. For
instances, A. Hindele et al. [14] designed text-based
CAPTCHA images based on reverse engineering techniques, i.e. bitmap comparison, threshold, segmentation, dilation and erosion. The text-based CAPTCHA

for smart-phone and tablet PC was introduced by Mitsuo Okada et al. [15] to avoid bypassing verication
of OCR programs. The technique used multiple noise
images, where invisible objects or texts were hidden
within a certain area. The design based on color, usability, and security was examined by A. E. Ahmad et
al. [16]. The CAPTCHA image utilizes a simple color
scheme to increase its usability to avoid the potentially
complicated consequences of usability and security.
J. S. Lee and M. H. Hsieh [17] proposed an effective
method to help conrming the authentication process
within a network system.

1) Colour-to-Gray Conversion: the step converts a redgreen-blue(RGB) image or a color-map image to a grayscale image. Matching the luminance of a RGB image seems
signicant. Then, there are several methods for conversion
and matching such as maximum method, average method and
weight average method etc. For this computation, the weight
average method is taken into account where the constant
luminance weighted scales of the R, G, and B components are
constant at 0.2989, 0.5870, and 0.1140, respectively. A gray
value (GV ) is computed by Equ. (1).

Decoding and resolving schemes: this issue considers actually on two sub-domains.

2) Threshold Computation: the step converts a gray scale


image to a black-white image with following purposes: reducing execution and extracting objects from background. Since
a CAPTCHA image consists of dark objects over bright background, a global threshold value (T) is applied for extraction
as illustrated in Equ. (2)
1 + 2
T =
,
(2)
2
where 1 and 2 are intensity values of R1 and R2 regions.
The computation results Z(x,y) bases on the gray value at any
points x and y with T are calculated by Equ. (3).

255 f (x, y) T
Z(x, y) =
(3)
0
otherwise

Segmentation: a breaking algorithm of a text-based


CAPTCHA image for internet backing application was
discussed by J. Zhang and X. Wang [20]. With their
investigation, difcult CAPTCHAs has to be hard to
separate the text from background and segment characters [21]. They generally consider on three domain
of decoding procedures, which are preprocessing, segmentation, and recognition

Recognition and matching: A. A. Chandavale et al. [18] proposed an algorithm to decode


a text-based CAPTCHA image. The algorithm
was mainly developed for robust analysis, based on
segmentation and recognition characters depending on
images features, but it lacked on matching accuracy.
G. Mori et al. [19] developed efcient methods based
on shape context matching that can identify the word
in an EZGimpy image with a success rate of 92%,
and the requisite 3 words in a Gimpy image 33% of
the time. They introduced a general framework which
was applied to other recognition problems.

From above literatures, text-based CAPTCHA images are


playing an important role with the existing online web applications due to their simplication and consumed time. This paper
thus considers on robust analysis of text-based CAPTCHA
images with several degrees of noise, skew, font type and size.
The rest of this papaer are following. Section III expresses
the methodology of Template Matching Correlation (TMC)
technique and implementation consisting of image conversion,
threshold, noise rejection, segmentation and recognition, are
introduced. The simulation results as well as robust comparison are described in section IV. conclusion is described in
section V. Finally, section VI is our future works.
III.

T ECHNIQUE , A LGORITHM AND I MPLEMENTATION

A. Template Matching Correlation(TMC)


The basic idea of the template matching technique makes
use of calculation on each position of an image. For examination, a distortion function measures the degree of similarity
between a template and an image. The minimum distortion or
maximum correlation is then taken to locate the template into
the examined image. The detail of the TMC technique on each
process is described as following.

GV = (0.2989 R) + (0.5870 G) + (0.1140 B)

(1)

The f(x,y) function is a function to compute a gray value


corresponding to any point x and y, p(x,y).
3) Noise Rejection: since most noises appearing in the
background of the image are removed by the threshold method,
only scatter noises still remain. The scatter noises are rejected
by the selection of a constant value (P) and the structure
connectivity (CON), where all are in the range of 0 to 1. The
noise rejection function (NR) is shown in Equ. (4).
Q(x, y) = N R(Z(x, y), P, CON )

(4)

4) Segmentation: since the characters of the CAPTCHA


image are mostly separated by space, the basic idea of the
segmentation is to detect space in vertical and horizontal lines
from either left-to-right or right-to-left. Meanwhile, the group
of the connected pixels is detected, classied and kept in
record. Finally, the information of the group which is row
and column positions are applied to separate these characters.
However, if the width value of the character is greater, the
segmentation process will be repeated again.
5) Normalization and Recognition: resizing, scaling or
transforming the characters are processed in this step as shown
in Equ. (5)
x
y

=
=

r
r ,

(5)

where r and r are scaling rations of the direction of


coordinate axis. The interpolation computation of the nearest
neighbourhood is applied to approximate the x and y values for
a non-integer coordinate after the transformation. Afterwards,
the normalized characters are compared with the standard
characters within a template database by using characters
coefcient values.

(a)

computational results corresponding to the position x


and y of the pixel are lower than the constant value,
the pixel is eliminated as shown in Fig. 4

(b)

Fig. 2: The sample text-based CAPTCHA images for testing


and evaluation with (a) font skew and (b) noise background.

(a)

Step 4

Normalization and recognition: the characters of the


image are separated by rstly counting the number
of character, bwlabel function. The function nds and
counts the number of groups of continuous pixels, and
then calculate their position in row and column which
are used for separation. Afterwards, with the template
database identied in matrix form, 24 24 pixel, the
separate characters are compared with them based on
their coefcient values, corr2 function, as depicted in
Fig. 5

(b)

Fig. 3: The threshold of CAPTCHA images with (a) font skew


and (b) noise background.

B. Algorithm and Implementation

Algorithm 1 Y = TMC(I,T,C1)

The TMC technique is described by Algorithm 1 and


implemented by the MATLAB simulink with ve steps as
follows:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:

Step 1

Initialization: the input image n n dimension is


investigated for a RGB format. If right, the image will
be transformed into a gray-scale format by rgb2gray
function. The initial image is illustrated in Fig 2.

Step 2

Threshold computation: the image is converted from


the gray-scale format to the black-white format by
applying the computed threshold value (T ), graythresh
function. The T value is used to make decision
whether pixel should be black or white colour, im2bw
function. The thresholded image is shown in Fig. 3.

Step 3

Noise rejection: since solid noises are removed by


threshold step, scatter noises which are small object
will be ltered in this step. The ambiguous pixels are
removed by a gain value determined by bwareaopen
function, calculated based on the size of a pixel. If the

{Step1: Initialization process}


if (I = RGB) then
Exit
else
A=RGB-TO-GRAY(I)
end if
{Step2: Threshold computation}
L=GRAY-TO-THRESHOLD(A)
BW=IMG-TO-BW(A,L)
{Step3: Noise rejection}
X0=REMOVE-NOISE(BW,C1)
{Step4: Normalization and recognition}
X1=SEPARATION(X0)
Y=COMPARISON(X1,T)
return Y

IV.

E XPERIMENTAL R ESULTS AND C OMPARISON

To investigate robustness, text-based CAPTCHA images


are analysed by the Matlab Simulink tool corresponding to
the TMC technique. The framework for analysis is derived
from the several affected factors, i.e. font size, font type, the
number of characters and noise background.
A. Skew and characters effect

(a)

(b)

Fig. 4: The rejected noise of CAPTCHAs with (a) font skew


and (b) noise background.

(a)

Fig. 6 presents the normalized recognition results while


the skew is varied from 0 to 50 with increasing the size of
characters from 4 to 15. The minimum boundary in which
human is able to recognize is at 0.1 as shown by the red
dot lines in the gure, where the value is assumed from
our experiment. From the gure, although the skew greater
than 15 provides low recognition rate (high robustness), the
characters are hard to be recognize by human. To correspond
to the acceptable robustness and human ability (normalized
recognition) which is determined from our experiment, the
suitable skew is thus between 0.2 and 0.6. For example,
supposing that a text-based CAPTCHA image is designed at 10
characters, then the possible skew with acceptable robustness
is either 10 or 15 as illustrated by the dot yellow line.
B. Size effect

(b)

Fig. 5: The extracted and recognized CAPTCHA images with


(a) font skew and (b) noise background.

The normalized read-out and recognition while font size


is varied from 4 to 30 with the font types, i.e. Arial Black,
Arial Round MT, Bodoni MT, Calibri, and Forte, are shown
in Fig. 7a and 7b. From the gures, the minimum boundary
of the font size where human is able to read-out and to
recognize is at 6. Corresponding to the number of input

0.8

10

0.6

20

15

25

0.4

30

Human

35

0.2

40

45

0
4

0.9
Normalized readout

Normalized recognition

8
10
12
The number of characters

14

0.4

ArialBlack
ArialRoundedMT
BodoniMT
Calibri
Forte

Human

Percentage of test
(TMC)
92%
80%
33%

10

15

20
Font size

25

30

(a)
1

Normalized recognition

TABLE I: The comparison result of the Gimpy-based images


of Greg Mori [19] with the proposed TMC method.
Percentage of test
(Greg Mori [19])
92%
75%
33%

0.6

50

Fig. 6: The average of the normalized recognition with the


increasing number of characters from 4 to 15 and various
degrees of skew from 0 to 50.

#Word
Recognition
1
2
3

0.7

0.5

0.8

0.8

ArialBlack
ArialRoundedMT
BodoniMT
Calibri
Forte

0.6
0.4

Human

0.2
0

10

15

20
Font size

25

30

(b)

C. Noise background effect


Fig. 8 shows the normalized recognition with the four
font types, i.e. Calibri, Arial black, Bodoni MT black, and
Copper black while the noise background is changed. From
the simulation results, all font types are going to stabilize with
the normalized noise at 0.4, 40%. However, since the minimum
boundary of the normalized recognition where human can read
is at 0.1, the compromising area with the acceptable robustness
& human ability and the normalized noise are in the range of
0.1 to 0.6 of the normalize recognition and 0.2 to 0.8 of the
normalized noise.

Fig. 7: The normalized results of (a) read-out, (b) recognition


with four font types and various font sizes from 4 to 30.

Normalized recognition

characters of a CAPTCHA image, the term read-out means


the number of characters decoded by the program without
considering correctness. Assuming that we have applied the
acceptable robustness and human ability between 0.1 and 0.6,
the simulation results can expressed as follows: 1) the font
size and the sample font types are not affect to the text-based
CAPTCHA images robustness, 2) the robustness is enhanced
while the font size is in the range of 6 to 20 with the font
types Calibri as shown in Fig. 7b.

Calibri
Arial black
Bodoni MT black
Cooper black

0.8
0.6
0.4

Human

0.2
0
0

0.2

0.4
0.6
Normalized Noise

0.8

Fig. 8: The normalized recognition with the four font types,


where the normalized noise is varied from 0 to 1.

D. Robustness comparison
The Gimpy-based image, word recognition in the presence
of clutter [19], is obtained for comparison. We design our
a text-based CAPTCHA image for testing with the noise
background at 0.35 and at font size 16. Our task in to identify a
word in the cluttered image by randomly image 200 instances
as shown in Fig. 9.

Fig. 9: A sample of the Gimpy-based image with the noise


background at 0.35 at font size 16.

V.

C ONCLUSION

This paper proposes the analysis and investigation of textbased CAPTCHA images robustness with the several degrees
of noise, skew, font type and size based on the Template
Matching Correlation (TMC) technique. The method, algorithm and implementation are introduced. From the simulation
results based on the acceptable robustness and the human
reading ability (normalized recognition) in the range of 0.1
to 0.6, the designed font skew is between 10 and 15 . In
addition, suggestion of the font size and the font type for the
robust design of the text-based CAPTCHA images is that the
font size should greater than 6 and the font type should be
Calibri. For the noise background investigation, the included
noise is in the range of 0.3 to 0.4 of the normalized noise
for maintaining the level of security, where human is able to
recognize but hard to be decoded by automatic programs.
VI.

F UTURE WORKS

Since our task is mainly considered on the robust analysis


of the text-based CAPTCHA images, improving the decoding
ability of our automatic program becomes our target. Our
future works are thus as follows: to introduce an efcient
computation method to come out a suitable threshold value
to enhance the efciency of noise rejection and to improve the
performance of segmentation and recognition processes.
R EFERENCES
[1]

[2]
[3]

[4]
[5]

[6]

[7]

[8]

[9]

[10]
[11]

[12]

L. von Ahn, M. Blum, and J. Langford, Telling humans and computer


apart automatically: How lazy cryptographers do ai, Comm. ACM,
vol. 47, no. 2, pp. 5760, 2004.
A. Kolupaev and J. Ogijenko, Captchas: Humans vs. bots, IEEE
Security & Privacy, vol. 6, pp. 6870, 2008.
L. V. Ahn, M. Blum, N. J. Hopper, and J. Langford, Captcha: Using
hard ai problems for security, in International Conference Theory
Application Cryptograph Technique, May 2003, pp. 294311.
J. Yan and A. S. E. Ahmad, Captcha security: A case study, IEEE
Security & Privacy, vol. 7, no. 4, pp. 2228, 2009.
M. A. Kouritzin, F. Newton, and W. Biao, On random eld completely
automated public turing test to tell computers and humans apart
generation, IEEE Transactions on Image Processing, vol. 22, pp. 1656
1666, 2013.
D. Kapoor, H. Bangar, Abhishek, and A. Sethi, An ingenious technique
for symbol identication from high noise captcha images, in Annual
IEEE India Conference (INDICON), 2012, pp. 98103.
J. Lung, Ethical and legal considerations of recaptcha, in Tenth Annual
International Conference on Privacy, Security and Trust (PST), 2012,
pp. 211216.
G. Sauer, H. Hochheiser, J. Feng, and J. Lazar, Towards a universally
usable captcha, in In Proceedings of the Symposium on Accessible
Privacy and Security, ACM Symposium On Usable Privacy and Security
(SOUPS08), Pittsburgh, PA, USA, 2008.
H. Gao, H. Liu, D. Yao, X. Liu, and U. Aickelin, An audio captcha to
distinguish humans from computers, in Third International Symposium
on Electronic Commerce and Security (ISECS), 2010, pp. 265269.
J. Tam, J. Simsa, S. Hyde, and L. V. Ahn, Breaking audio captchas,
in Advances in Neural Information Processing Systems., 2008.
J. Bigham and A. Cavender, Evaluating existing audio captchas and
an interface optimized for non-visual use, in In Proceedings of ACM
CHI Conference on Human Factors in Computing Systems, 2009, pp.
18291838.
R. Rahman, D. Tomar, and S. Das, Dynamic image based captcha,
in International Conference on Communication Systems and Network
Technologies (CSNT), 2012, pp. 9094.

[13] H. Gao, D. Yao, H. Liu, X. Liu, and L. Wang, A novel image based
captcha using jigsaw puzzle, in IEEE 13th International Conference
on Computational Science and Engineering (CSE), 2010, pp. 351356.
[14] A. Hindle, M. W. Godfrey, and R. C. Holt, Reverse engineering
captchas, in Reverse Engineering, 2008, pp. 5968.
[15] M. Okada and S. Matsuyama, New captcha for smartphones and tablet
pc, in IEEE Consumer Communications and Networking Conference
(CCNC), 2012, pp. 3435.
[16] A. S. E. Ahmad, J. Yan, and W.-Y. Ng, Captcha design: Color,
usability, and security, IEEE Internet Computing, vol. 16, pp. 4451,
2012.
[17] J.-S. Lee and M.-H. Hsieh, Preserving user-participation for insecure
network communications with captcha and visual secret sharing technique, IET Networks, vol. 2, no. 2, 2013.
[18] A. A.Chandavale, A. M. Sapkal, and R. M. Jalnekar, Algorithm
to break visual captcha, in Emerging Trends in Engineering and
Technology (ICETET), 2009 2nd International Conference on, 2009,
pp. 258262.
[19] G. Mori and J. Malik, Recognizing objects in adversarial clutter:
breaking a visual captcha, in CVPR 03 Proceedings of the 2003
IEEE computer society conference on Computer vision and pattern
recognition, 2003, pp. 143141.
[20] J. Zhang and X. Wang, Breaking internet banking captcha based on
instance learning, in Computational Intelligence and Design (ISCID),
2010 International Symposium on, 2010, pp. 3943.
[21] J. Yan and A. S. E. Ahmad, Captcha robustness: A security engineering
perspective, Computer, vol. 44, pp. 5460, 2011.

Anda mungkin juga menyukai