Last Updated:  Thursday, February 23, 2017

Captcha

NOTE: This is an old (~2006 viintage) article changed to meet my needs for this webpage.  Thus, when you come across "me" and "we" and the like, that is not ME, the author of this website. Some of the information was outdated and the changes I made were substantial. I can not attribute this article to anyone because it is so old I have no idea where/who it came from.  Over I will further update and add information to this pace, too. It only takes time <grin>.  This page should really be updated for increased accuracy and appeal.


Captchas

  I t used to be said that Captcha Turing Tests are a great way to slow spammers down. Spammers and other annoying jerks have discovered ways to abuse websites that offer contact forms, guest books, feedback pages, and so on. When spammers use automated tools to attack these pages, they can post many unwanted messages very quickly and create some interesting warnings and threats from your ISP.To slow down these attacks, many websites started to use CAPTCHA (Completely Automated Public Turing tes)t to tell Computers and Humans Apart) software. "Turing tests" in general are used to distinguish between computers and real people. In particular, captcha software (the term is commonly used as an ordinary noun, uncapitalized) tries to provide a test that humans can easily pass, but computers will hopefully fail.

   BUT, well, for those of you who have come across some of that oldtime captcha stuff know better, don't you? It DOES keep hackers et al away, but it ALSO keeps honest visitors out, too. Today if I come across one of those unreadable, twistedtext codes, I don't even try to use them; I just move on if I do not get it right on my first try; I have better things to spend my time on! Sometimes I can tell as soon as I see the code that it's useless and I don't even bother trying.

A typical captcha involves a picture of text— usually with the text rotated, distorted, colored and otherwise creatively altered.  The only reason for making the text so hard to read, according to the authors, was to thwart OCR (Optical Character Recognition) software to read them by bots and hackers. Much better things have come about since those early days of captcha and the codes are no longer hard to decipher. They might have kept out hackers et al, but they also kept our honest people who just wanted to use whatever it was the captcha was "protecting".


What the heck IS a Turing Test?
A "Turing test " is any test that attempts to distinguish human beings from computers. The idea of a Turing test is credited to the computer science pioneer Alan Turning, who first described it in 1950. For more information, see Wikipedia's Turing test entry.  It was a good concept but misused by otherwise misunderstanding people who did not bother to think who else they might be giving grief; the honest user or visitor.


Captchas Are Not Perfect... Not Even Close
Sounds like a good idea— so what's the catch? Well, there are several problems:

1. Computers can break 'em anyway ... although amateur programmers won't have an easy time doing so. Greg Mori and Jitendra Mailk's Breaking a Visual CAPTCHA discusses advanced techniques that can be used to crack even fairly sophisticated captcha systems.

2. Some humans can't break 'em!  Obviously, blind users can't solve a visual captcha. Better captcha systems also offer an audiobased option. Even then, deafblind users (those who are both deaf and blind) are locked out. Sites employing captchas should at least consider offering special accounts to those with special needs in this area. Or more simply, rethink how it's implemented.  One solution is to offer a telephone number— and make sure you accept TDD relay calls! These are voice calls placed through an interpreter. Your telephone support staff should be educated about this and encouraged to create accounts or carry out other captchaprotected tasks on behalf of legitimate users who contact you via phone.  Not very useful advice for the gazillions of personal web sites, hobby sites and so on.

3. Captchas can take up extensive CPU resources (that is, slow down your web server) or require features not present on your website (for example, some web hosts do not include the GD library in their PHP offering— probably because they don't realize how easy it is).

4. Bad guys will, in some cases, hire humans to do the data entry instead, or at least to do the captchasolving part. If your troublemakers are determined to get past the captcha , they can.

So, does this mean you shouldn't use a captcha ? Not at all. Some sites (mine included) are faced with many abusive, unwelcome form submissions every day. For sites like these, a welldesigned captcha system makes all the difference. Just don't expect perfection, be sure to include the audio option, and offer alternatives for deafblind users. Note that the telephone can be a valid alternative  deafblind users with access to the Internet via braille interfaces likely also have access to TDD relay services which allow them to place voice calls through an interpreter. IF, that is, you can afford the equipment and manpower to do so.


How To Implement A Captcha
That's enough about the why (and the why not). How do we implement a practical captcha system?
There are many dynamic web programming languages in the world, and I can't cover all of them in every article . Here I'll assume you are using PHP. If you're not using PHP, you should consider it! PHP is the most popular tool of its kind, and it runs on all major web servers.
I have written a simple captcha system in PHP which you can use on your own site. It's easy to set up and extremely easy to plug into your own PHP code. And you can try a live demo here.
The only catches are:
 
1. Your server's PHP must include the GD library, with support for JPEG and Freetype (TrueType font output). If it does, you will see that mentioned in your phpinfo page. If not, complain to your web host— they aren't doing a good job if they still don't give you this widely expected features by now.

2. If you don't mind the Bitstream Vera font or the sound of my voice, great! If you aren't crazy about those two things, you'll need to provide your own recordings of the letters of the alphabet and your own TrueType font file (.ttf file) as described in the next two steps.

The situation has progressed far enough however, that there is little need for GD images and the numbers can be real, fully in the open. The need for the complexity of use by deaf or blind visitors is thus reduced to zero; unnecessary. Normal text to speech programs on a visitor's computer makes the new kind of captcha page work just fine and without any hoops for anyone to have to jump thru for a captchalke protection system. The advances in PHP have also greatly improved the necessary coding required for captchalike protection. The numbers are no longer problems w/r to folks with lowered color perception and general unreadability of the past.

3. Optional: i f you do decide to replace the audio samples, you'll need to record your own, perhaps using Audacity. Name them a.wav, b.wav, etc. (all lower case) and upload them to the fonts subdirectory of your captcha directory. Then convert them to the correct raw audio format by running the convertwavtoub.pl Perl script That script requires the sox utility. sox can be installed as a package in most Linux distributions. If that's not the case for your server (for instance, because your hosting is on Windows), visit the sox project page. Or just use my provided audio samples and don't worry about it.

4. Optional: if you decide to replace the font, you'll need a decentlooking TrueType font file (.ttf). And you need to know the file path on the server to that TrueType font file. For copyright reasons, fonts are sometimes not included on web server systems. Look in /usr/share/fonts (Linux/Unix) and c:/windows/fonts (Windows) fonts, or upload TrueType files of your own. Of course you can just use the Bitstream Vera font I provide and not worry about this.

The Basics of how I do "captcha" with PHP:

I use MTRand() to generate two 4digit strings.
   I then concatenate those two strings into one.
   I give the visitor 3 choices of questions to answer, mentioning that it is important to remember what they were .
       The 3 questions are also randomized from a list of 24 questions total.  There is no indication how many there are of course, or that each one is randomly chosen for use. I might ask for, say, their zip code, street name, age and so on.; they're all simple, easily remembered information.
   After all the required data is filled in complete with the captcha code, and they click submit, Ill do input sanitizing and cleaning. Any errors will send them back to the 1st page of the form and a counter is incremented by 1. If the counter reaches 3, then they are ceremonially dumped out to the Home Page. Unless they get a new IP, they are banned from using the email form until midnight of the following day at the earliest but they're only told they can not use the email form, period, for the rest of that session. Their IP is recorded and temporarily saved for two days. Unless the IP is renewed, the email form simply will not work for 24+ hours .
   If they get that far, then two of the three previously asked questions on the first page are repeated. Correct answers to those questions then grants them access.

So really all they have to do is answer three questions on one page, repeat the answers on the next page, and they're IN!  If I find or suspect a hacking attempt, then I'll go to the .htaccess and ban their IP range, their ISP, or whatever else I gathered along the way, and ban them permanently.  There is enough publicly available data for comparison as an aid to decide whether it's the same source or not. Google Analytics makes a lot of this record keeping a simple job of monitoring, taking the load off my own code/server.
   Then the sessions are destroyed, everything is torn down and killed off.
   Oh, and usng the Back Button takes them back a page as expected, but it'll be asking for answers they haven't given since the question are ALL refreshed at screenpant time. including the first page's random number. That was a little hard to figure out, but I finally managed it.
   This or similar methods are used on each and every website I work on.

I will not provide more details on just how I accomplish the captchalike access, but compared to captcha, it's not driving away any visitors.

I would however appreciate any constructive criticism or comments on this page. I'm always open to better ways to do things. If you should respond, remember do NOT include any personal information other than your email address if you wish to receive a response.

The same is true for all the new versions of Captcha that have come into existence; they are just no good and too easy to break.  There are many other methods of accomplishing the same goal that DO work!  A little research is all it takes. 

 

   Copyright  2017; All Rights Reserved by twaynesdomain.com.  No material of any kind may be reproduced in any manner nor displayed anywhere without explicit written permission of the copyright holder, namely Tom Rivet.Unless indicated otherwise on specific pages.

By Viewing any page on this website you are bound by all Policies  and Regulations in effect at the time of your visit. Ignorance of the Policies is not an excuse for violating them.

LOGO-twaynesdomaincom