What happens on the wire
First, the browser makes the few following requests:
https://www.google.com/recaptcha/api.js, whose function is mainly to load the next one...
https://www.gstatic.com/recaptcha/api2/r20141202135649/recaptcha__en.js, which contains common code.
The browser then makes a requests to
https://www.google.com/recaptcha/api2/anchor, whose response contains the very interesting stuff: a callback to a function called
recaptcha.anchor.Main.init, which contains two base64-encoded parameters.
https://www.google.com/js/bg/6yg-ggdQgQAg8SAADJkAjc-JMNnOnYuIGgH_iBV7uf8.js. The second one contains *double-*base64-encoded binary data.
The first parameter is the bytecode interpreter. After trimming the
')})(), and passing it to JSBeautifier, I finally dove in this mass of minified code.
The interpreter has two entry points: the
M function which is executed when ReCaptcha is loaded, and
M.prototype.ha which is executed when you click the checkbox, and that returns the information for Google servers.
I first discovered that the bytecode was encrypted using the XTEA algorithm. Each block of 8 bytes is xored with a keystream (so decryption and encryption functions are the same), where the first 32-bit word of plaintext is read from the bytecode file, the second 32-bit word is the position in the bytecode file divided by 8, and the key is by default
[0, 0, 0, 0].
Function.toString() rocks, it doesn't?), or with the output of browser-specific functions and CSS rules, or with the hostname of the calling domain (www.google.com)...
After about 2 days of work, I produced a working disassembler and then decompiler for the ReCaptcha bytecode. You can try it from this GitHub repository. However, it stills has some hardcoded keys values, so it will only work on the bytecode sample contained in the
enc file for now.
Just execute the
xhr2 are byte arrays that contains the data later sent to Google servers.
Google servers will receive and process, at least, the following information:
- Screen resolution
- Execution time, timezone
- Number of click/keyboard/touch actions in the
<iframe>of the captcha
- It tests the behavior of many browser-specific functions and CSS rules
- It checks the rendering of canvas elements
- Likely cookies server-side (it's executed on the www.google.com domain)
- And likely other stuff...
You can look at the decompiled bytecode for more precision.
This information, along with numeric values hardcoded in the bytecode (forcing a potential bot to read all of it), is sent to the
https://www.google.com/recaptcha/api2/frame page. Look at the
M.prototype.Q function to see how the encoding process is realized. Some of information (the one I call
xhr2 in the decompiler, which is retrieved in the
this.c[this.g] variable −
xhr1 is in
this.c[this.d]) is also encrypted with XTEA.
- Make statistics about when the checkbox-captcha suffices and when it doesn't.
- Programmatically bypass the captcha by interpreting bytecode.
- Programmatically bypass the captcha by simply executing a rendering engine and automating movements of the mouse. But it would be slighty less funny.
Cheers and good reversing!