A CAPTCHA is a program that protects websites against bots by generating and grading tests
that humans can pass but current computer programs cannot. For example, humans can read
distorted text as the one shown below, but current computer programs can't:
The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and
Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and
John Langford of Carnegie Mellon University.
Today we are going to see how CAPTCHA (Completely Automated Public Turing test to tell
Computers and Humans Apart) works and how it minimizes automatic sign-up of forms. We will
also be creating a simple CAPTCHA script in PHP to illustrate this.
1. Create Random Value: Some random string is generated, random values are often hard to
guess and predict.
2. Generate an Image: Images are used as these are generally a lot harder to read for
computers while being nice and readable to humans. This is also the most important step
as simple text in images can be read (and CAPTCHA cracked) quite easily. To make it
difficult for them, developers employ different techniques so that the text in the image
becomes hard to read for computers. Some create zig-zag lines for background while
others twist-and-turn individual characters in the image. Possibilities are many and new
techniques are being developed all the time as crackers are always into finding ways to
break them.
3. Store it: The random string generated (which is also in the image) is stored for matching
the user input. The easiest way to do so is to use the Session variables.
4. Matching: After the above step, the CAPTCHA image is generated and shown on some
form which we want to protect from being abused. The users fills in the form along with
the CAPTCHA text and submits it. Now we have the following:
1. All submitted form data.
2. CAPTCHA string (from form), input by user.
3. CAPTCHA string (real one, generated by us), from session variable. Session
variable is generally used as it can keep stored values across page requests. Here,
we needed to preserve stored values from one page (form page) to another (action
page-that receives form data).
5. If both match, it's okay otherwise not, in that case we can give the user a message that the
CAPTCHA they had entered was wrong and their form could not be submitted. You
could also ask them to verify it again.
From the above image it's quite clear that when someone requests the form page, the CAPTCHA
text is generated and sent back to requesting user, but only in the form of an image. If the
requester is a human he'd not have much difficulty reading the image and inputting the text when
asked but if it's a bot it might face difficulties guessing whats in the image. In the next step when
we match the string generated and the one the user had input, we can restrict automated form
submissions.
The following is the code that does this, it'll just output the CAPTCHA image to the browser
when the script is requested:
<?php
/********************************************************
* File: captcha.php *
* Author: Arvind Gupta (www.arvindgupta.co.in) *
* Date: 12-Mar-2009 *
* Description: This file can be embedded as image *
* to show CAPTCHA/ *
********************************************************/
// The number of characters you
// want your CAPTCHA text to have
define('CAPTCHA_STRENGTH', 5);
/****************************
* INITIALISE *
****************************/
// Tell PHP we're going to use
// Session vars
session_start();
// Md5 to generate the random string
$random_str = md5(microtime());
// Trim required number of characters
$captcha_str = substr($random_str, 0, CAPTCHA_STRENGTH);
// Allocate new image
$width = (CAPTCHA_STRENGTH * 10)+10;
$height = 20;
$captcha_img =ImageCreate($width, $height);
// ALLOCATE COLORS
// Background color-black
$back_color = ImageColorAllocate($captcha_img, 0, 0, 0);
// Text color-white
$text_color = ImageColorAllocate($captcha_img, 255, 255, 255);
// Line color-red
$line_color = ImageColorAllocate($captcha_img, 255, 0, 0);
/****************************
* DRAW BACKGROUND & *
* LINES *
****************************/
// Fill background color
ImageFill($captcha_img, 0, 0, $back_color);
// Draw lines accross the x-axis
for($i = 0; $i < $width; $i += 5)
ImageLine($captcha_img, $i, 0, $i, 20, $line_color);
// Draw lines accross the y-axis
for($i = 0; $i < 20; $i += 5)
ImageLine($captcha_img, 0, $i, $width, $i , $line_color);
/****************************
* DRAW AND OUTPUT *
* IMAGE *
****************************/
// Draw the random string
ImageString($captcha_img, 5, 5, 2, $captcha_str, $text_color);
// Carry the data (KEY) through session
$_SESSION['key'] = $captcha_str;
// Send data type
header("Content-type: image/jpeg");
// Output image to browser
ImageJPEG($captcha_img);
// Free-Up resources
ImageDestroy($captcha_img);
?>
Okay, this it for this, in the next one we'll integrate this CAPTCHA script into one form and see
how it works. Till then goodbye!
( http://www.codewalkers.com/c/a/Miscellaneous/Creating-a-CAPTCHA-with-PHP/1/ )
Note: You may notice session_start() at the top of this script, this is to start the session which
will be used later....
<?php
//Start the session so we can store what the code actually is.
session_start();
//Now lets use md5 to generate a totally random string
$md5 = md5(microtime() * mktime());
/*
We dont need a 32 character long string so we trim it down to 5
*/
$string = substr($md5,0,5);
?>
Next we will write this string to the image and output it to the user
2. Writing the text to the image:
Now that we have the text to write we actually need to write it to the image and display it to the
user. This is made fairly easy with GD.
<?php
/*
Now for the GD stuff, for ease of use lets create
the image from a background image.
*/
$captcha = imagecreatefrompng("./captcha.png");
/*
Lets set the colours, the colour $line is used to generate lines.
Using a blue misty colours. The colour codes are in RGB
*/
$black = imagecolorallocate($captcha, 0, 0, 0);
$line = imagecolorallocate($captcha,233,239,239);
/*
Now to make it a little bit harder for any bots to break,
assuming they can break it so far. Lets add some lines
in (static lines) to attempt to make the bots life a little harder
*/
imageline($captcha,0,0,39,29,$line);
imageline($captcha,40,0,64,29,$line);
?>
As you can see from the code above we are loading the basic image from CAPTCHA.png
instead of building the image itself which could be a little complex for this basic tutorial. When
we use colour in GD we need to allocate the colour to a variable, we do this with
imagecolorallocate(). Once we have the colours stored inside of the respected variables we then
use them to draw the lines through the image. This is to make the robots job of cracking the
captcha just that little bit harder, because we are nice to the robots like that :)
Finally we have to write the text to the image which is made easy with imagestring() . The only
thing left to do on this image is to output it which is done by setting the content type of the page
to image/png with header() and outputting the image to the browser with imagepng(). It is also
worth mentioning that the string is encrypted and stored in the session variable
$_SESSION['key']
<?php
/*
Now for the all important writing of the randomly generated string to the ima
ge.
*/
imagestring($captcha, 5, 20, 10, $string, $black);
/*
Encrypt and store the key inside of a session
*/
$_SESSION['key'] = md5($string);
/*
Output the image
*/
header("Content-type: image/png");
imagepng($captcha);
?>
Now assuming that this form has been submitted we need to check if the code matches what was
on the image, after all this is the whole point of a captcha system. You can do this in any php file
as long as the form described above submits to it. For basic checking we will use the code below.
<?php
session_start();
//Encrypt the posted code field and then compare with the stored key
if(md5($_POST['code']) != $_SESSION['key'])
{
die("Error: You must enter the code correctly");
}else{
echo 'You entered the code correctly';
}
?>
The session_start() you see here simply continues the session from the previous page, easy
enough. Then its just a case of simple text matching which you can see is done by the if
statement.
4. Improvements and Conclusion:
(Page 6 of 6 )
Well that is all there is to CAPTCHA images just a simple writing of text to an image and storing
of the text (key). However the captcha I just described how to build is not the best in the world
by a long shot. If you're feeling adventurous you could try the following things:
Andrew Walsh lives in the UK where he is starting a course of computing and related computer
studies at a local college. A programming hobbyist but plans to pursue a career in programming.
Currently studying/working with php, mysql, (x)html, css and some limited c++. Sometimes
writes code snippets and articles on his personal "development" website at http://walshdev.com/.
Captcha Application:
CAPTCHAs have several applications for practical security, including (but not limited to):
Most bloggers are familiar with programs that submit bogus comments, usually
for the purpose of raising search engine ranks of some website (e.g., "buy penny stocks
here"). This is called comment spam. By using a CAPTCHA, only humans can enter
comments on a blog. There is no need to make users sign up before they enter a
comment, and no legitimate comments are ever lost!
Several companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a
few years ago, most of these services suffered from a specific type of attack: "bots" that
would sign up for thousands of email accounts every minute. The solution to this problem
was to use CAPTCHAs to ensure that only humans obtain free accounts. In general, free
services should be protected with a CAPTCHA in order to prevent abuse by automated
scripts.
Spammers crawl the Web in search of email addresses posted in clear text.
CAPTCHAs provide an effective mechanism to hide your email address from Web
scrapers. The idea is to require users to solve a CAPTCHA before showing your email
address. A free and secure implementation that uses CAPTCHAs to obfuscate an email
address can be found at reCAPTCHA MailHide.
Online Polls:
CAPTCHAs also offer a plausible solution against email worms and spam: "I
will only accept an email if I know there is a human behind the other computer." A few
companies are already marketing this idea.
Types of CAPTCHAs:
Text CAPTCHAs:
1. Normal Type:
These are simple to implement. The simplest yet novel approach is to present the user
with some questions which only a human user can solve. Examples of such questions are:
Such questions are very easy for a human user to solve, but it’s very difficult to program a
computer to solve them. These are also friendly to people with visual disability – such as those
with colour blindness.
2. Gimpy:
Gimpy is a very reliable text CAPTCHA built by CMU in collaboration with Yahoo for
their Messenger service. Gimpy is based on the human ability to read extremely distorted text
and the inability of computer programs to do the same. Gimpy works by choosing ten words
randomly from a dictionary, and displaying them in a distorted and overlapped manner. Gimpy
then asks the users to enter a subset of the words in the image. The human user is capable of
identifying the words correctly, whereas a computer program cannot.
3. Ez – Gimpy:
This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in their signup
page. Ez – Gimpy randomly picks a single word from a dictionary and applies distortion to the
text. The user is then asked to identify the text correctly.
4. BaffleText:
1. BONGO:
Bongo is a program that asks the user to solve a visual pattern recognition problem. In
particular, Bongo displays two series of blocks, the left and the right series. The blocks in the left
series differ from those in the right, and the user must find the characteristic that sets the two
series apart. A possible left and right series are shown below:
(These two series are different because everything in the left is drawn with thick lines, while
everything in the right is drawn with thin lines.)
After seeing the two series of blocks, the user is presented with four single blocks and is asked to
determine whether each block belongs to the right series or to the left. The user passes the test if
he or she correctly detemrines the side to which all the four blocks belong.
2. PIX:
PIX is a program that has a large database of labeled images. All of these images are pictures of
concrete objects (a horse, a table, a house, a flower, etc). The program picks an object at random,
finds 4 random images of that object from its database, distorts them at random, presents them to
the user and then asks the question "what are these pictures of?" (See the example below.)
Current computer programs are not able to answer this question.
Audio CAPTCHAs:
The final example we offer is based on sound. The program picks a word or a sequence
of numbers at random, renders the word or the numbers into a sound clip and distorts the sound
clip; it then presents the distorted sound clip to the user and asks users to enter its contents. This
CAPTCHA is based on the difference in ability between humans and computers in recognizing
spoken language. Nancy Chan of the City University in Hong Kong was the first to implement a
sound-based system of this type. The idea is that a human is able to efficiently disregard the
distortion and interpret the characters being read out while software would struggle with the
distortion being applied, and need to be effective at speech to text translation in order to be
successful. This is a crude way to filter humans and it is not so popular because the user has to
understand the language and the accent in which the sound clip is recorded.
reCAPTCHA:
Digitizing Books One Word at a Time:
reCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and
old time radio shows.
A CAPTCHA is a program that can tell whether its user is a human or a computer.
You've probably seen them — colorful images with distorted text at the bottom of Web
registration forms. CAPTCHAs are used by many websites to prevent abuse from "bots," or
automated programs usually written to generate spam. No computer program can read distorted
text as well as humans can, so bots cannot navigate sites protected by CAPTCHAs.
About 200 million CAPTCHAs are solved by humans around the world every day. In
each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of
time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.
What if we could make positive use of this human effort? reCAPTCHA does exactly that by
channeling the effort spent solving CAPTCHAs online into "reading" books.
To archive human knowledge and to make information more accessible to the world,
multiple projects are currently digitizing physical books that were written before the computer
age. The book pages are being photographically scanned, and then transformed into text using
"Optical Character Recognition" (OCR). The transformation into text is useful because scanning
a book produces images, which are difficult to store on small devices, expensive to download,
and cannot be searched. The problem is that OCR is not perfect.
reCAPTCHA improves the process of digitizing books by sending words that cannot be
read by computers to the Web in the form of CAPTCHAs for humans to decipher. More
specifically, each word that cannot be read correctly by OCR is placed on an image and used as a
CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read
correctly.
But if a computer can't read such a CAPTCHA, how does the system know the correct
answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given
to a user in conjunction with another word for which the answer is already known. The user is
then asked to read both words. If they solve the one for which the answer is known, the system
assumes their answer is correct for the new one. The system then gives the new image to a
number of other people to determine, with higher confidence, whether the original answer was
correct.
Breaking CAPTCHAs:
Breaking CAPTCHAs without OCR:
Most CAPTCHAs don't destroy the session when the correct phrase is entered. So by
reusing the session id of a known CAPTCHA image, it is possible to automate requests to a
CAPTCHA-protected page.
Manual steps:
Automated steps:
Resend session ID and CAPTCHA plaintext any number of times, changing the user data.
The other user data can change on each request. We can then automate hundreds, if not
thousands of requests, until the session expires, at which point we just repeat the manual steps
and then reconnect with a new session ID and CAPTCHA text.
Spammers often use social engineering to outwit gullible Web users to serve their
purpose. Security firm, Trend Micro warns of a Trojan called TROJ_CAPTCHAR,which
masquerades as a strip tease game. At each stage of the game, the user is asked to solve a
CAPTCHA. The result is relayed to a remote server where a malicious user is waiting for them.
The strip-tease game is a ploy by spammers to identify and match solutions for ambiguous
CAPTCHAs from legitimate sites, using the unsuspecting user as the decoder of the said images.
No CAPTCHA can survive a human that’s receiving financial incentives for solving it.
CAPTCHA are cracked by firms posing as Data Processing firms. They usually charge $2 for
1000 CAPTCHAs successfully solved. They advertise their business as “Using the advertisement
in blogs, social networks, etc significantly increases the efficiency of the business. Many services
use pictures called CAPTCHAs in order to prevent automated use of these services. Solve
CAPTCHAs with the help of this portal; increase your business efficiency now!” Such firms help
spammers in beating the first line of defence for a Website, i.e., CAPTCHAs.