Sooner or later a new generation of spam protection methods will emerge to block all unwanted site visitors. The recently launched Google “No CAPTCHA reCaptcha” or ReCaptcha v2.0 could just be such a method.
This new behaviour analysis tool is getting more and more attention both from the site owners and from scraping engines who are trying to break it. Since Google does not reveal any secrets of its operation, we want to share with you the techniques used in this new smart analysis CAPTCHA that determines between bot and human. Let s look inside.
How does No CAPTCHA reCaptcha work
When a user moves to and ticks the I m not a robot checkbox, that behaviour drives even more browser events. These are caught by the same script and a request with encoded payload is sent to the Google server, the user s fingerprints are recorded and his cookies stored.
The behaviour analysis system on the Google server analyses the data provided and returns an encoded value to the client page. This value is user and time dependent.
In case of confusion (or bot-like behavior) Google s server will ask the client to complete an additional image-check CAPTCHA (see picture below) to further verify if the user is a bot or not.
The encoded value bears the hidden info if user is verified or not. But then you need to know whether Google has verified that user or not on that page. To check it, you send a POST (ajax) request with the following parameters: the returned encoded value, the secret key and end user ip (the last one is optional). Read the details on how to fetch and verify the user s response.
Cases in which a second image-check is required
Bot is suspicious of behavior in the initial test. In cases when the risk analysis engine can t confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues, increasing the number of security checkpoints to confirm the user is valid. from Google reCaptcha page.
Expiration of time is also handled with new reCaptcha. If there is no response from the client for a while, the reCaptcha pops up an additional image-check puzzle.
ReCaptcha application on mobile devices. The website will show you images for comparison/selection and you will be verified upon single or multiple tap(s).
Criteria of engine verification analysis
For this new type of CAPTCHA the main evidence will be browser behaviour, rather than check box value.
- mouse movement, its slightness and straightness
- page scrolls
- time intervals between browser events
- click location history tied to user fingerprint
All these criteria, are stored in the browser s cookie. These criteria are processed by Google s server to discern bots from humans it is pretty hard for bots to mimic the browser behavior of humans. This technique is pretty far advanced when you compare it to the old CAPTCHAs spam protection methods which for the most part can be solved using today s technology.
Today s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy. Thus distorted text, on its own, is no longer a dependable test (by Google research).
Some more on the behavior captcha
Some readers are perplexed: If the software is capable of differentiating between bots and humans before presenting CAPTCHAs, then what is the point of the CAPTCHA?
ReCaptcha 2.0 is smart. Really smart. How much CAPTCHA users are asked to do, depends on how human they behave. If the risk assessment machine does not have enough evidence that a user is a human, it puts additional tricks (image CAPTCHA) for final verification. This method should remove the usual frustrations we humans feel when confronted with the traditional super distorted text CAPTCHAs.
Want it? Register in google to integrate it
At this point, I believe, many readers are eager to get this new generation CAPTCHA on their sites. Prior to using it, you need to register your site (prooving your site ownership) in reCaptcha google service. Upon success you ll be issued the reCaptcha credentials (a site key and a secret key). The site key is later integrated into the form with reCaptcha (follow steps of the reCaptcha management after a signup) while the secret key is needed for final verification by your server. This php library is available for integrating reCaptcha into a website.
In the following post we’ve described how to integrate it on site and make it work.
The simplest form with reCaptcha code
<script src="https://www.google.com/recaptcha/api.js" > <form method="post"> <div class="g-recaptcha" data-sitekey="[site key issued by google]"></div> <input value="submit" type="submit" /> </form>
Need to break it?
At the same time, I am sure some web scraping developers and businesses would like to find a way of breaking through this type of CAPTCHA.
We’ve managed an iMacro script that breaks reCaptcha thru a brute force approach. Selenium has also contributed in here.
In the following posts, we’ll explore some software and services that might be able to break this new CAPTCHA. So, stay tuned! If you want to help us test drive these methods, please let me know in the comments.
The reCaptcha v2.0 is no doubt a nice and powerful tool in spam and web scraping protection. Google has finally created a good user experience for sites which rely on CAPTCHA. Yet, I believe, both human labour CAPTCHA solving services and the programming CAPTCHA solving systems will continue to fight and break this new invention in the endless human-bot competition.