PDA

View Full Version : Google is using ReCAPTCHA to decode and digitize street names and numbers



Ilie Pandia
15th January 2014, 13:45
Hello,

A CAPTCHA test is a technology that it is used to prevent automatic spam. It is said that only a human person (and not a program) can pass this test, therefore this will prevent automatic submissions.

The most popular captcha test that I know of is ReCAPTCHA, the one implemented by Google and widely used.

Google started to digitize books and some texts could not be easily decoded via automatic means. So ReCAPTCHA was used. This basically asks a human to decode that text, helping with the decoding.

All was fine for a while until I've noticed that the text that we need to decode are actually street numbers and names and various locations.

This got me worried a bit... Why would Google need that?... certainly is not to decode and digitize books. So I've searched for it and here is what Google apparently said about this:



We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations.

Based on the data and results of these reCAPTCHA tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online.

(source (http://www.theregister.co.uk/2012/04/04/google_recaptcha_street_view/) from April 2012)

It seems benign enough and even helpful if you try to use Google maps, but with all the NSA spying that's now coming out into the open I am personally concerned that this mechanism is used for spying, so we may inadvertently help Google (and other interested agencies) in decoding location information that they otherwise would have to spend their own resources decoding!

What to do about it?

There are 3 options that I can think of (but surely not the only ones):

1. Refresh the captcha until you get a picture that seems to be text from a book or newspaper. This may take a lot of refreshing.

2. Mis-decode the house number: So if you see "190" decode it as "196". The system will not be able to tell the difference and you will still pass the test. This will boycott the effort and not everybody may agree with that, since digitizing is not evil if it's used properly. Only the abuse of this information does not sit right with me.

3. Stop using the service that requires you to pass that test and eventually contact the webmaster with your concerns. He/she may be able to bypass it for you. By polite though, that test is there to avoid spammers so try not to sound like one :).

Sources with images that have captchas with street numbers:
http://www.slashgear.com/street-view-signs-and-house-numbers-get-used-in-recaptcha-30220700/
http://techcrunch.com/2012/03/29/google-now-using-recaptcha-to-decode-street-view-addresses/
http://www.theregister.co.uk/2012/04/04/google_recaptcha_street_view/

sigma6
15th January 2014, 17:32
brilliant insights... totally agree, if the technology is there it is being abused by the Military Industrial Complex being funded by the tax payers. There is just too much money and power involved until the people wake up and take back their control, via their votes or more better their pocket books.

They may counter #1, by limiting re-trys, but that would be counter productive, since most people can't read the first several anyway (at least I can't half the time!) I love #2 LOLOLOL.... but the masses will never pick up on it, they can barely pick their noses.

Mad Hatter
20th January 2014, 08:35
Another option maybe, assuming you own the property... take out common law copyright on any image or derivative thereof then go the buggers on that score.
One might also consider while your at it taking out same on your face, your DNA, your signatures, etc.