Reema Abdullah Alabdullatif
College of Computer and Information sciences, King Saud University, KSA
reema.ksu@gmail.com
ABSTRACT
Forums, Blogs, Email addresses, video sharing sites and others have become a target to either commercial
or noncommercial spam. Spammers use bots to crawl through websites and pick up email addresses, post
spam or consume the accounts of them. Excessive server loads, illegal spam, theft of resources and many
were all consequences of spamming. This paper talks about CAPTCHA as a solution to limit the spamming.
Keywords
CAPTCHA, security, spam.
1.INTRODUCTION
A CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a program
that generates and grades tests that are human solvable, but intends to be beyond the capabilities of current
computer programs [1]. The term "CAPTCHA" was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper
and John Langford of Carnegie Mellon University. At the time, they developed the first CAPTCHA to be used by Yahoo.
[2]. This technology is now almost a standard security mechanism for defending against undesirable Internet
bots programs, such as those spreading junk emails and those grabbing thousands of free email accounts
instantly. It has found widespread application on numerous commercial web sites including Google, Yahoo,
and Microsoft’s MSN. [3]
The most common use of CAPTCHA on the web today is to try preventing the repeatedly automatic
submission of forms by bots, usually for the purpose of spam. By adding a CAPTCHA to form, it can cut down
on the amount of spam received via a contact form or can prevent bots from signing up for accounts on the
website.
Spamming is among the top few reasons, which today’s webmasters have to deal with. In the other hand,
CAPTCHA is among a few successful techniques which used by almost all of the web sites to control the
automated spamming activities
The most widely used CAPTCHA is the textbased schemes, which rely on text images distortion to make
them unrecognizable to recognition programs. There are many other types covered up next.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.
The First MiniConference in Web Technologies and Trends (WTT)
© 2009 Information Technology Department, CCIS, King Saud University, Riyadh, Saudi Arabia
2.TYPES OF CAPTCHA
By far, the most common type of CAPTCHA involves the use of letters that are arranged randomly and are
distorted in some way with various background colors. These are the ones that you will most likely have seen
when signing up for an email account. But actually, other alternatives do exist [4].
2.1CharacterBased CAPTCHA
This category means that a string of characters is presented to the user. This string can contain either words
or random alphanumeric characters (See Figure 1).
Figure1. Different kinds of character-based CAPTCHA with different level of distortion [5]
2.2 ImageBased CAPTCHA
Images or pictures are presented to the user. This is normally in the form of an identifiable realworld object,
but can also be presented in the form of shapes. The task is to identify the object shown in the picture.
The problem in this type of CAPTCHA is that it needs a large set of pictures to become effective which will
lead to consume a large amount of server space.
2.3 AnomalyBased CAPTCHA
Users are asked to determine which object, or character or shape does not belong in a set of images
displayed on the screen. This type of CAPTCHA has the same disadvantage of the Imagebased CAPTCHA.
.
2.4 RecognitionBased CAPTCHA
The users need to determine what is being presented to them. In the case of a character based and
recognition based CAPTCHA the user needs to identify and input the character string that is presented to
them.
2.5SoundBased CAPTCHA
The user is presented with an audio version of a CAPTCHA. The user listens to the audio file and inputs their
answer. A sound based CAPTCHA can be presented in two formats,
1. Spoken words or numbers.
2. Sounds related to an image.
This CAPTCHA is effective for the people who have visual impairment. It is, probably , the second most
common type of CAPTCHA (Figure 2).
3. APPLICATIONS OF CAPTCHA
CAPTCHA has several applications for practical security, including [7]:
3.1 Preventing Comment Spam in Blogs
Most bloggers are familiar with programs that submit bogus comments, usually for the purpose of raising
search engine ranks of some website . This is called comment spam. By using a CAPTCHA, only humans
can enter comments on a blog. There is no need to make users sign up before they enter a comment, and no
legitimate comments are ever lost.
3.2Protecting Website Registration
Several companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a few years ago, most of these
services suffered from a specific type of attack "bots" that would sign up for thousands of email accounts
every minute. The solution to this problem was to use CAPTCHAs to ensure that only humans obtain free
accounts. In general, free services should be protected with a CAPTCHA in order to prevent abuse by
automated scripts.
3.3Protecting Email Addresses from Scrapers
Spammers crawl the Web to search email addresses posted in clear text. CAPTCHA provide an effective
mechanism to hide email addresses from Web scrapers. The idea is to require users to solve a CAPTCHA
before showing the email address.
3.4Online Polls
In November 1999, http://www.slashdot.org released an online poll asking which graduate school in computer
science was the best. As is the case with most online polls, IP addresses of voters were recorded in order to
prevent single users from voting more than once. However, students at Carnegie Mellon found a way to stuff
the ballots using programs that voted for CMU thousands of times. CMU's score started growing rapidly. The
next day, students at MIT wrote their own program and the poll became a contest between voting "bots." MIT
finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school with less than 1,000.
Can the result of any online poll be trusted?! Not unless the poll ensures that only humans can vote.
3.5Preventing Dictionary Attacks
CAPTCHA can also be used to prevent dictionary attacks in password systems. The idea is simple: prevent a
computer from being able to iterate through the entire space of passwords by requiring it to solve a
CAPTCHA after a certain number of unsuccessful logins. This is better than the classic approach of locking
an account after a sequence of unsuccessful logins, since doing so allows an attacker to lock accounts as
will.
3.6Search Engine Bots
It is sometimes desirable to keep WebPages unindexed to prevent others from finding them easily. There is
an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that
bots won't read a web page. It only serves to say "no bots, please." Search engine bots, since they usually
belong to large companies, respect web pages that don't want to allow them in. However, in order to truly
guarantee that bots won't enter a web site, CAPTCHAs are needed.
3.7Worms and Spam
CAPTCHA also offer a conceivable solution against email worms and spam: "I will only accept an email if I
know there is a human behind the other computer." A few companies are already marketing this idea.
4.CAPTCHA DEVELOPMENT
Developers have recognized the accessibility shortcomings of the visual CAPTCHA and have begun research
into soundbased CAPTCHA. One major shortcoming of CAPTCHA based on spoken text or numbers is that
the audio has to be distorted to defeat the use of automated speech recognition to solve the challenges.
Because of this distortion it becomes difficult even for a human to differentiate between the distortion and the
valid data [8].
5.CAPTCHA FUTURE
CAPTCHA design should pay attention to the values of universal usability. Tools should support a large range
of users of different backgrounds and abilities. Current CAPTCHA systems create a separation between their
visual and audio CAPTCHA. The audio CAPTCHA is essentially a distinct system with a completely
independent development and maintenance path. Alternatively, the visual and audio CAPTCHA can be joined
products into one single system in which the audio is directly related to the visual elements that are presented
to the user. This type of CAPTCHA will be more accessible for users with visual impairments, as well as
having possible benefits of easy adaptation for different languages and cultures.
6.CONCLUSION
Sites with attractive resources and millions of users will always need access control systems that limit the bad
using of them. At that level, it is reasonable to employ many concurrent approaches, including audio and
visual CAPTCHA, to do so. However, it must be noted that users with disabilities can interact with a given
resource in a reasonable amount of time.
7.REFERENCES
[1] L von Ahn, M Blum and J Langford. “Telling Humans and Computer Apart Automatically”, CACM, V47,
No2, 2004
[2] The Official CAPTCHA Site. Located on the Internet at http://www.captcha.net/.
Last visited: 12 December, 2008.
[3] J Yan, A Salah El Ahmad," A Lowcost Attack on a Microsoft CAPTCHA".
[4] G Sauer, H Hochheiser, J Feng, and J Lazar. "Towards a Universally Usable CAPCHA".
[5] Network Security Research and AI, Located on the Internet at http://networksecurity
research.blogspot.com/. Last visited: 11 December, 2008
[6] ReCAPTCHA: Stop Spam Read Books (2007), Located on the Internet at http://recaptcha.net/.
Last visited: 11 December, 2008.
[7] The Official CAPTCHA Site. Located on the Internet at http://www.captcha.net/.
Last visited: 12 December, 2008.
[8] G Sauer, H Hochheiser, J Feng, and J Lazar. "Towards a Universally Usable CAPCHA".