Anda di halaman 1dari 17

Completely Automated Public Turing Test to Tell Computers

and Humans Apart( CAPTCHA )

CAPTCHA: Telling Humans and Computers Apart Automatically

A CAPTCHA is a program that protects websites against bots by generating and grading tests
that humans can pass but current computer programs cannot. For example, humans can read
distorted text as the one shown below, but current computer programs can't:

The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and
Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and
John Langford of Carnegie Mellon University.

Today we are going to see how CAPTCHA (Completely Automated Public Turing test to tell
Computers and Humans Apart) works and how it minimizes  automatic sign-up of forms. We will
also be creating a simple CAPTCHA script in PHP to illustrate this.

Basically CAPTCHA works in the following manner:

1. Create Random Value: Some random string is generated, random values are often hard to
guess and predict.
2. Generate an Image: Images are used as these are generally a lot harder to read for
computers while being nice and readable to humans. This is also the most important step
as simple text in images can be read (and CAPTCHA cracked) quite easily. To make it
difficult for them, developers employ different techniques so that the text in the image
becomes hard to read for computers. Some create zig-zag lines for background while
others twist-and-turn individual characters in the image. Possibilities are many and new
techniques are being developed all the time as crackers are always into finding ways to
break them.
3. Store it: The random string generated (which is also in the image) is stored for matching
the user input. The easiest way to do so is to use the Session variables.
4. Matching: After the above step, the CAPTCHA image is generated and shown on some
form which we want to protect from being abused. The users fills in the form along with
the CAPTCHA text and submits it. Now we have the following:
1. All submitted form data.
2. CAPTCHA string (from form), input by user.
3. CAPTCHA string (real one, generated by us), from session variable. Session
variable is generally used as it can keep stored values across page requests. Here,
we needed to preserve stored values from one page (form page) to another (action
page-that receives form data).
5. If both match, it's okay otherwise not, in that case we can give the user a message that the
CAPTCHA they had entered was wrong and their form could not be submitted. You
could also ask them to verify it again.

The following image might illustrates this better:


How CAPTCHA is Generated and Matched

From the above image it's quite clear that when someone requests the form page, the CAPTCHA
text is generated and sent back to requesting user, but only in the form of an image. If the
requester is a human he'd not have much difficulty reading the image and inputting the text when
asked but if it's a bot it might face difficulties guessing whats in the image. In the next step when
we match the string generated and the one the user had input, we can restrict automated form
submissions.
The following is the code that does this, it'll just output the CAPTCHA image to the browser
when the script is requested:

<?php
/********************************************************
 * File:        captcha.php                             *
 * Author:      Arvind Gupta (www.arvindgupta.co.in)    *
 * Date:        12-Mar-2009                             *
 * Description: This file can be embedded as image      *
 *              to show CAPTCHA/                        *
 ********************************************************/

// The number of characters you
// want your CAPTCHA text to have
define('CAPTCHA_STRENGTH', 5);

/****************************
 *        INITIALISE        *
 ****************************/
// Tell PHP we're going to use
// Session vars
session_start();

// Md5 to generate the random string
$random_str = md5(microtime());

// Trim required number of characters
$captcha_str = substr($random_str, 0, CAPTCHA_STRENGTH);

// Allocate new image
$width = (CAPTCHA_STRENGTH * 10)+10;
$height = 20;

$captcha_img =ImageCreate($width, $height);

// ALLOCATE COLORS
// Background color-black
$back_color = ImageColorAllocate($captcha_img, 0, 0, 0);

// Text color-white
$text_color = ImageColorAllocate($captcha_img, 255, 255, 255);

// Line color-red
$line_color = ImageColorAllocate($captcha_img, 255, 0, 0);

/****************************
 *     DRAW BACKGROUND &    *
 *           LINES          *
 ****************************/
// Fill background color
ImageFill($captcha_img, 0, 0, $back_color);

// Draw lines accross the x-axis
for($i = 0; $i < $width; $i += 5)
    ImageLine($captcha_img, $i, 0, $i, 20, $line_color);
// Draw lines accross the y-axis
for($i = 0; $i < 20; $i += 5)
    ImageLine($captcha_img, 0, $i, $width, $i , $line_color);

/****************************
 *      DRAW AND OUTPUT     *
 *          IMAGE           *
 ****************************/
// Draw the random string
ImageString($captcha_img, 5, 5, 2, $captcha_str, $text_color);

// Carry the data (KEY) through session
$_SESSION['key'] = $captcha_str;

// Send data type
header("Content-type: image/jpeg");

// Output image to browser
ImageJPEG($captcha_img);

// Free-Up resources
ImageDestroy($captcha_img);
?>

Okay, this it for this, in the next one we'll integrate this CAPTCHA script into one form and see
how it works. Till then goodbye!
( http://www.codewalkers.com/c/a/Miscellaneous/Creating-a-CAPTCHA-with-PHP/1/ )

1. Random text generated

2. Text written to image

3. Text stored in session/cookie/database

4. Image displayed to user

5. User enters the code

6. User entered code is checked against the stored key

7. If they match then something is done

1. Creating the random text:


Right now we are up to generating the random text. To do this I will use the php functions,
microtime() and mktime() to generate a number. This number will then be encrypted using
md5(). With this 32 character long encrypted string we will then use substr() to cut it down to a 5
letter long string. This is our random text.

Note: You may notice session_start() at the top of this script, this is to start the session which
will be used later....

<?php
//Start the session so we can store what the code actually is.
session_start();

//Now lets use md5 to generate a totally random string
$md5 = md5(microtime() * mktime());

/*
We dont need a 32 character long string so we trim it down to 5
*/
$string = substr($md5,0,5);
?>

Next we will write this string to the image and output it to the user
2. Writing the text to the image:
Now that we have the text to write we actually need to write it to the image and display it to the
user. This is made fairly easy with GD.

<?php
/*
Now for the GD stuff, for ease of use lets create
 the image from a background image.
*/

$captcha = imagecreatefrompng("./captcha.png");

/*
Lets set the colours, the colour $line is used to generate lines.
 Using a blue misty colours. The colour codes are in RGB
*/

$black = imagecolorallocate($captcha, 0, 0, 0);
$line = imagecolorallocate($captcha,233,239,239);

/*
Now to make it a little bit harder for any bots to break, 
assuming they can break it so far. Lets add some lines
in (static lines) to attempt to make the bots life a little harder
*/
imageline($captcha,0,0,39,29,$line);
imageline($captcha,40,0,64,29,$line);
?>

As you can see from the code above we are loading the basic image from CAPTCHA.png
instead of building the image itself which could be a little complex for this basic tutorial. When
we use colour in GD we need to allocate the colour to a variable, we do this with
imagecolorallocate(). Once we have the colours stored inside of the respected variables we then
use them to draw the lines through the image. This is to make the robots job of cracking the
captcha just that little bit harder, because we are nice to the robots like that :)

Finally we have to write the text to the image which is made easy with imagestring() . The only
thing left to do on this image is to output it which is done by setting the content type of the page
to image/png with header() and outputting the image to the browser with imagepng(). It is also
worth mentioning that the string is encrypted and stored in the session variable
$_SESSION['key']

<?php
/*
Now for the all important writing of the randomly generated string to the ima
ge.
*/
imagestring($captcha, 5, 20, 10, $string, $black);
/*
Encrypt and store the key inside of a session
*/

$_SESSION['key'] = md5($string);

/*
Output the image
*/
header("Content-type: image/png");
imagepng($captcha);
?>

3. Check if the user entered the code correctly:


To check if the user entered the code correctly you must first allow the user to do this. You can
do this with a simple text form that requires a code to be entered, a simple text field called code
or something similar should do nicely. Then you just display the image to the user with a simple
<img xsrc="captcha.php" border="0"> tag. It is really too low a level to show you how to make a
form like this, if you don't know how to make a form like I described above then this tutorial is
probably not for you.

Now assuming that this form has been submitted we need to check if the code matches what was
on the image, after all this is the whole point of a captcha system. You can do this in any php file
as long as the form described above submits to it. For basic checking we will use the code below.

<?php
session_start();

//Encrypt the posted code field and then compare with the stored key

if(md5($_POST['code']) != $_SESSION['key'])
{
  die("Error: You must enter the code correctly");
}else{
  echo 'You entered the code correctly';
}
?>

The session_start() you see here simply continues the session from the previous page, easy
enough. Then its just a case of simple text matching which you can see is done by the if
statement.
4. Improvements and Conclusion:
(Page 6 of 6 )

Well that is all there is to CAPTCHA images just a simple writing of text to an image and storing
of the text (key). However the captcha I just described how to build is not the best in the world
by a long shot. If you're feeling adventurous you could try the following things:

 Use a TTF font


 Move the lines randomly
 Randomly position the text on the image
 Rotate the text randomly
 Use words instead of that string (ie: have a randomly picked word out of say a file of about
1000)About the Author

Andrew Walsh lives in the UK where he is starting a course of computing and related computer
studies at a local college. A programming hobbyist but plans to pursue a career in programming.
Currently studying/working with php, mysql, (x)html, css and some limited c++. Sometimes
writes code snippets and articles on his personal "development" website at http://walshdev.com/.
Captcha Application:

CAPTCHAs have several applications for practical security, including (but not limited to):

 Preventing Comment Spam in Blogs:

Most bloggers are familiar with programs that submit bogus comments, usually
for the purpose of raising search engine ranks of some website (e.g., "buy penny stocks
here"). This is called comment spam. By using a CAPTCHA, only humans can enter
comments on a blog. There is no need to make users sign up before they enter a
comment, and no legitimate comments are ever lost!

 Protecting Website Registration:

Several companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a
few years ago, most of these services suffered from a specific type of attack: "bots" that
would sign up for thousands of email accounts every minute. The solution to this problem
was to use CAPTCHAs to ensure that only humans obtain free accounts. In general, free
services should be protected with a CAPTCHA in order to prevent abuse by automated
scripts.

 Protecting Email Addresses From Scrapers:

Spammers crawl the Web in search of email addresses posted in clear text.
CAPTCHAs provide an effective mechanism to hide your email address from Web
scrapers. The idea is to require users to solve a CAPTCHA before showing your email
address. A free and secure implementation that uses CAPTCHAs to obfuscate an email
address can be found at reCAPTCHA MailHide.

 Online Polls:

In November 1999, http://www.slashdot.org released an online poll asking which


was the best graduate school in computer science (a dangerous question to ask over the
web!). As is the case with most online polls, IP addresses of voters were recorded in
order to prevent single users from voting more than once. However, students at Carnegie
Mellon found a way to stuff the ballots using programs that voted for CMU thousands of
times. CMU's score started growing rapidly. The next day, students at MIT wrote their
own program and the poll became a contest between voting "bots." MIT finished with
21,156 votes, Carnegie Mellon with 21,032 and every other school with less than 1,000.
Can the result of any online poll be trusted? Not unless the poll ensures that only humans
can vote.

 Preventing Dictionary Attacks:


CAPTCHAs can also be used to prevent dictionary attacks in password systems.
The idea is simple: prevent a computer from being able to iterate through the entire space
of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful
logins. This is better than the classic approach of locking an account after a sequence of
unsuccessful logins, since doing so allows an attacker to lock accounts at will.

 Search Engine Bots:

It is sometimes desirable to keep webpages unindexed to prevent others from


finding them easily. There is an html tag to prevent search engine bots from reading web
pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves
to say "no bots, please." Search engine bots, since they usually belong to large
companies, respect web pages that don't want to allow them in. However, in order to truly
guarantee that bots won't enter a web site, CAPTCHAs are needed.

 Worms and Spam:

CAPTCHAs also offer a plausible solution against email worms and spam: "I
will only accept an email if I know there is a human behind the other computer." A few
companies are already marketing this idea.

Spammers Success Rate:


In February 2008 it was reported that spammers had achieved a success rate of 30% to 35%,
using a bot, in responding to CAPTCHAs for Microsoft's Live Mail service and a success rate of 20%
against Google's Gmail CAPTCHA. A Newcastle University research team has defeated the
segmentation part of Microsoft's CAPTCHA with a 90% success rate, and claim that this could lead to a
complete crack with a greater than 60% rate.

Types of CAPTCHAs:
Text CAPTCHAs:

1. Normal Type:

These are simple to implement. The simplest yet novel approach is to present the user
with some questions which only a human user can solve. Examples of such questions are:

1. What is twenty minus three?


2. What is the third letter in UNIVERSITY?
3.Which of Yellow, Thursday and Richard is a colour?
4. If yesterday was a Sunday, what is today?

Such questions are very easy for a human user to solve, but it’s very difficult to program a
computer to solve them. These are also friendly to people with visual disability – such as those
with colour blindness.

2. Gimpy:

Gimpy is a very reliable text CAPTCHA built by CMU in collaboration with Yahoo for
their Messenger service. Gimpy is based on the human ability to read extremely distorted text
and the inability of computer programs to do the same. Gimpy works by choosing ten words
randomly from a dictionary, and displaying them in a distorted and overlapped manner. Gimpy
then asks the users to enter a subset of the words in the image. The human user is capable of
identifying the words correctly, whereas a computer program cannot.

3. Ez – Gimpy:

This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in their signup
page. Ez – Gimpy randomly picks a single word from a dictionary and applies distortion to the
text. The user is then asked to identify the text correctly.
4. BaffleText:

This was developed by Henry Baird at University of California at Berkeley. This is


a variation of the Gimpy. This doesn’t contain dictionary words, but it picks up random
alphabets to create a nonsense but pronounceable text. Distortions are then added to this text and
the user is challenged to guess the right word. This technique overcomes the drawback of Gimpy
CAPTCHA because, Gimpy uses dictionary words and hence, clever bots could be

designed to check the dictionary for the matching word by brute-force.

Graphic Based Captcha:

1. BONGO:

Bongo is a program that asks the user to solve a visual pattern recognition problem. In
particular, Bongo displays two series of blocks, the left and the right series. The blocks in the left
series differ from those in the right, and the user must find the characteristic that sets the two
series apart. A possible left and right series are shown below:

(These two series are different because everything in the left is drawn with thick lines, while
everything in the right is drawn with thin lines.)

After seeing the two series of blocks, the user is presented with four single blocks and is asked to
determine whether each block belongs to the right series or to the left. The user passes the test if
he or she correctly detemrines the side to which all the four blocks belong.

2. PIX:

PIX is a program that has a large database of labeled images. All of these images are pictures of
concrete objects (a horse, a table, a house, a flower, etc). The program picks an object at random,
finds 4 random images of that object from its database, distorts them at random, presents them to
the user and then asks the question "what are these pictures of?" (See the example below.)
Current computer programs are not able to answer this question.

Audio CAPTCHAs:
The final example we offer is based on sound. The program picks a word or a sequence
of numbers at random, renders the word or the numbers into a sound clip and distorts the sound
clip; it then presents the distorted sound clip to the user and asks users to enter its contents. This
CAPTCHA is based on the difference in ability between humans and computers in recognizing
spoken language. Nancy Chan of the City University in Hong Kong was the first to implement a
sound-based system of this type. The idea is that a human is able to efficiently disregard the
distortion and interpret the characters being read out while software would struggle with the
distortion being applied, and need to be effective at speech to text translation in order to be
successful. This is a crude way to filter humans and it is not so popular because the user has to
understand the language and the accent in which the sound clip is recorded.

reCAPTCHA:
Digitizing Books One Word at a Time:

reCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and
old time radio shows.

A CAPTCHA is a program that can tell whether its user is a human or a computer.
You've probably seen them — colorful images with distorted text at the bottom of Web
registration forms. CAPTCHAs are used by many websites to prevent abuse from "bots," or
automated programs usually written to generate spam. No computer program can read distorted
text as well as humans can, so bots cannot navigate sites protected by CAPTCHAs.

About 200 million CAPTCHAs are solved by humans around the world every day. In
each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of
time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.
What if we could make positive use of this human effort? reCAPTCHA does exactly that by
channeling the effort spent solving CAPTCHAs online into "reading" books.
To archive human knowledge and to make information more accessible to the world,
multiple projects are currently digitizing physical books that were written before the computer
age. The book pages are being photographically scanned, and then transformed into text using
"Optical Character Recognition" (OCR). The transformation into text is useful because scanning
a book produces images, which are difficult to store on small devices, expensive to download,
and cannot be searched. The problem is that OCR is not perfect.

reCAPTCHA improves the process of digitizing books by sending words that cannot be
read by computers to the Web in the form of CAPTCHAs for humans to decipher. More
specifically, each word that cannot be read correctly by OCR is placed on an image and used as a
CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read
correctly.

But if a computer can't read such a CAPTCHA, how does the system know the correct
answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given
to a user in conjunction with another word for which the answer is already known. The user is
then asked to read both words. If they solve the one for which the answer is known, the system
assumes their answer is correct for the new one. The system then gives the new image to a
number of other people to determine, with higher confidence, whether the original answer was
correct.

Breaking CAPTCHAs:
Breaking CAPTCHAs without OCR:

Most CAPTCHAs don't destroy the session when the correct phrase is entered. So by
reusing the session id of a known CAPTCHA image, it is possible to automate requests to a
CAPTCHA-protected page.

Manual steps:

 Connect to CAPTCHA page


 Record session ID and CAPTCHA plaintext

Automated steps:

Resend session ID and CAPTCHA plaintext any number of times, changing the user data.
The other user data can change on each request. We can then automate hundreds, if not
thousands of requests, until the session expires, at which point we just repeat the manual steps
and then reconnect with a new session ID and CAPTCHA text.

Traditional CAPTCA-breaking software involves using image recognition routines to


decode CAPTCHA images. This approach bypasses the need to do any of that, making it easy to
hack CAPTCHA images.

Social Engineering used to break CAPTCHAs:

Spammers often use social engineering to outwit gullible Web users to serve their
purpose. Security firm, Trend Micro warns of a Trojan called TROJ_CAPTCHAR,which
masquerades as a strip tease game. At each stage of the game, the user is asked to solve a
CAPTCHA. The result is relayed to a remote server where a malicious user is waiting for them.
The strip-tease game is a ploy by spammers to identify and match solutions for ambiguous
CAPTCHAs from legitimate sites, using the unsuspecting user as the decoder of the said images.

CAPTCHA cracking as a business:

No CAPTCHA can survive a human that’s receiving financial incentives for solving it.
CAPTCHA are cracked by firms posing as Data Processing firms. They usually charge $2 for
1000 CAPTCHAs successfully solved. They advertise their business as “Using the advertisement
in blogs, social networks, etc significantly increases the efficiency of the business. Many services
use pictures called CAPTCHAs in order to prevent automated use of these services. Solve
CAPTCHAs with the help of this portal; increase your business efficiency now!” Such firms help
spammers in beating the first line of defence for a Website, i.e., CAPTCHAs.

Anda mungkin juga menyukai