I spent the day yesterday installing reCAPTCHA to help combat spam I’ve been getting on this and some other websites. I’ve known about the technology for a while, but I really hadn’t realised how far it had come.

Quick History:
The term “CAPTCHA” was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all of Carnegie Mellon University). It is an acronym based on the word “capture” and standing for “Completely Automated Public Turing test to tell Computers and Humans Apart”

Well, that’s where it started and the idea is quite noble. Spam is suppressed because bots/computers can’t pass the test. We use computers to generate and assess a test that humans can generally pass, but the computers themselves, can’t. The video below is from the designer of reCAPTCHA and he details why this system is better.

It is.

Basically, the time people spend solving CAPTCHAs is “wasted” time. It is unproductive. However, the reCAPTCHA project “uses” this time constructively. There are many large projects that are digitising old books, and the process involves scanning these books and using OCR to transcribe them. But as with CAPTCHAs, OCR suffers the same problem and can’t decipher all the words. This is where reCAPTCHA comes in. The images you see are words from scanned documents.

reCAPTCHA actually uses the human who is passing the test to solve OCR problems that computers can’t. I’m not doing the project justice. Check out the following document for some real world examples. This is pretty good stuff.

Once you’ve check that out, you can check out the following video from the reCAPTCHA team/project.

Oh, and by the way, back in 2009, reCAPTCHA was acquired by Google.

Let’s go back further: I watched a video a few years back, another guy had a very similar idea for cataloguing all the images on the internet. Unfortunately, this video is long, but he came up with a novel way of doing it. He created a game whereby people played a (re)CAPTCHA style of game. The funny part about this game was, CAPTCHAs annoy people, yet people played this game voluntarily.

Edit: this has been installed for a few days now and I haven’t got any spam since. Worth the free price I paid and 15 minutes to install! (I have a multi-site system)

4 Replies to “reCAPTCHA”

    1. I was going to delete this comment as spam but—after trying this numerous times (below) myself—it works. Although the posters email address is invalid (leading me to believe it was spam) the technique works. So there you go. You have to decipher the less clear text (or highly distorted text) as this is from a known list, the other word is an unedited scanned original—hence, unknown and not really tested

  1. I thought I would try a test to see if what a previous commenter had written was true. If this comment get’s through then I would suppose it DOES work. I have two words below, one is clearly legible, the other is not. I am assuming the less clear word is from the known list and will enter it correctly, the more clear entry I am going to enter a dummy word… here goes…

  2. It apparently works, this is actually a third test (2nd one that is published). I have a barely legible word (I will try and correctly enter this), and the clearer word I am going to enter random text

