Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving the CF issue - once and for all #13089

Open
EnginePod opened this issue May 14, 2017 · 2 comments
Open

Solving the CF issue - once and for all #13089

EnginePod opened this issue May 14, 2017 · 2 comments

Comments

@EnginePod
Copy link

@EnginePod EnginePod commented May 14, 2017

As some of you may know, CF has a mode called "I'm under attack" which shows a JS challenge.
A lot of sites are using this to prevent DDoS attacks, but it's accidentally also stopping youtube-dl since there's no function that solves the JS challenge.

It's caused me and other users a lot of annoyances (see: #11572) since we're having to visit the page manually, copy the cookies from the browser to a text file and specify the location of the cookies and the exact user-agent in youtube-dl. The cookie stops working if your IP changes (if you have a dynamic IP like mine), if the user agent changes (common when the browser is updated) or when the auth expires.

Now there is a solution written in Python called cfscrape (https://github.com/Anorov/cloudflare-scrape/blob/master/cfscrape/__init__.py), but it seems to be using node.js to solve the CF challenge which I think is overkill since the JS challenge can be parsed and solved manually.


So instead of having youtube-dl return a 503 error, it would:

  1. Upon a 503 on the first request; check if the page is a Cloudflare challenge
  • Can easily be done by checking the source for "challenge-form" && "cloudflare" (and a few other ways)
  1. Extract the parameters from the HTML form:
  <form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get">
    <input type="hidden" name="jschl_vc" value="2zd281b23ba38136ffa257335eccbbeb"/>
    <input type="hidden" name="pass" value="1493771132.19-OPc3Nc0Sh+"/>
    <input type="hidden" id="jschl-answer" name="jschl_answer"/>
  </form>
  1. Construct the URL
    https://website-name.tld/cdn-cgi/l/chk_jschl?jschl_vc=2zd281b23ba38136ffa257335eccbbeb&pass=1493771132.59-OPc4Nn0Sh+&jschl_answer=%s

  2. Now only the jschl_answer is needed, which involves going through some obfuscated code:

var s, t, o, p, b, r, e, a, k, i, n, g, f, obfuscatedVariable= {
  "AL": +((!+[] + !![] + !![] + !![]) + (!+[] + !![] + !![] + !![] + !![] + !![] + !![]))
};
t = document.createElement('div');
t.innerHTML = "<a href='/'>x</a>";
t = t.firstChild.href;
r = t.match(/https?:\/\//)[0];
t = t.substr(r.length);
t = t.substr(0, t.length - 1);
a = document.getElementById('jschl-answer');
f = document.getElementById('challenge-form');;
obfuscatedVariable.AL -= +((!+[] + !![] + !![] + []) + (+[]));
obfuscatedVariable.AL += !+[] + !![] + !![];
obfuscatedVariable.AL -= +((!+[] + !![] + !![] + []) + (!+[] + !![] + !![] + !![] + !![]));
obfuscatedVariable.AL += +((!+[] + !![] + !![] + !![] + []) + (!+[] + !![] + !![] + !![] + !![] + !![]));
obfuscatedVariable.AL *= +((!+[] + !![] + []) + (!+[] + !![] + !![] + !![] + !![] + !![] + !![] + !![]));
obfuscatedVariable.AL *= !+[] + !![];
obfuscatedVariable.AL += +((+!![] + []) + (!+[] + !![] + !![] + !![]));
a.value = parseInt(obfuscatedVariable.AL, 10) + t.length;
'; 121'
f.submit();
  • The JS does some "obfuscated math" involving arrays which are treated as "1" (e.g !![] + !![] = 2)
  • Once the calculations are done, JS adds the final integer to the form and submits the GET request
  • When the GET request is sent, a cookie is created and the IP & user-agent combination is authorized
  • All that needs to be done is to explode the strings, remove the spaces and do the math manually [since eval() in other languages won't give the same result as JS].

I don't know Python otherwise I'd have made a shot writing a solution for it.
A very good way to see the solution or experiment with the code is to run it in the console.

  1. Add the final cookie to the cache for the domain:
  • Next time that youtube-dl runs into a Cloudflare challenge it checks the domain
  • If the domain and user-agent match the details in the cache it uses the cached cookie
  • If the cookie for some reason doesn't bypass the auth page, it solves a new challenge and replaces the old cookie in the cache

This way it doesn't solve a challenge for every single request


I'm not an expert coder, but my idea was to add this "solver" to the HTTP(s) download libraries and have it check the source when a 503 is sent.
This way it would be able to solve challenges from every single site without having to edit all the extractor files.

@oczkers
Copy link

@oczkers oczkers commented Jul 31, 2017

Hi i've made simple python script to resolve this challenge, use/change freely but please mention my name in commit.

https://github.com/oczkers/pycfl/blob/master/pycfl.py

>>> _cf('+((!+[]+!![]+[])+(+!![]))')
1+1+0
1
'21'

>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]))')
1+1+1+0
1+1
'32'

>>> _cf('!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]')
1+1+1+1+1+1+1+1
'8'

>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+0
1+1+1+1+1+1+1
'37'

>>> _cf('+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+1+0
1+1+1+1+1+1+1
'47'

>>> _cf('!+[]+!![]+!![]+!![]+!![]+!![]')
1+1+1+1+1+1
'6'

>>> _cf('+((+!![]+[])+(!+[]+!![]+!![]))')
1+0
1+1+1
'13'

>>> _cf('+((!+[]+!![]+!![]+[])+(+[]))')
1+1+1+0
0
'30'

>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+0
1+1+1+1+1+1
'36'

>>> _cf('+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]))')
1+1+0
1+1+1+1+1
'25'

>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+0
1+1+1+1+1+1+1+1
'38'
@Tatsh
Copy link
Contributor

@Tatsh Tatsh commented Dec 8, 2017

I made it work, but this code requires launching Node to evaluate the JS. And yes you really have to wait 5 seconds before sending the answer, so using --cookies is still kind of a better option. If youtube-dl had a persistent cookie storage mechanism it would be very useful for this. ._save_cookie() does not persist because youtube-dl launches a new instance for every URL passed in.

This code is not perfect. Sometimes the regex fails, but works most of the time. Change CF_DOMAIN and CF_COOKIE_DOMAIN, set up _VALID_URL and it should work.

# coding: utf-8
from __future__ import unicode_literals
from collections import OrderedDict
from datetime import datetime
import re
import subprocess as sp
import time

from .common import InfoExtractor
from ..utils import ExtractorError, int_or_none


class MyIE(InfoExtractor):
    CF_DOMAIN = 'domain.name'
    CF_COOKIE_DOMAIN = '.domain.name'

    def set_downloader(self, downloader):
        self._downloader = downloader
        if downloader:
            # Add a 503 handler so we get the response
            class Handle503:
                def http_error_503(self, request, response, code, msg, hdrs):
                    return response
            self._downloader._opener.handle_error['http'][503] = [Handle503()]

    def _real_extract(self, url):
        # CloudFlare crap
        add_length = str(len(self.CF_DOMAIN))
        note = ('Downloading CloudFlare page (waiting 5 seconds to answer '
                'challenge)')
        webpage, urlh = self._download_webpage_handle('https://{}'.format(self.CF_DOMAIN),
                                                      '',
                                                      note=note)
        if urlh.getcode() == 503:
            cfduid_value = urlh.headers.getheader('set-cookie').split(';')[0].split('=')[1]
            hiddens = self._form_hidden_inputs('challenge-form', webpage)
            obj = add = var_name = None
            for line in webpage.split('\n'):
                line = line.strip()
                if 's,t,o,p,b,r,e,a,k,i,n,g' in line:
                    match = re.search(r',([^=]+)=({[^\}]+\})', line)
                    var_name = match.group(1).split(',')[-1].strip()
                    obj = match.group(2)
                if obj and '+=' in line and '![' in line:
                    add = line.replace(' t.length', add_length).replace('a.value =', 'x =')[1:]
                    add = add.replace('\n', '')
                    add = add.replace('.action += location.hash;', '')
                    break

            if obj and add and var_name:
                js = '{}={};{};console.log(x)\n'.format(var_name, obj, add)
                p = sp.Popen(['node'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
                answer = p.communicate(js)[0].strip()
                if not answer:
                    raise ExtractorError('No answer (please try again)! JS: {}'.format(js))

                # Remove the 503 handler because this should work now
                del self._downloader._opener.handle_error['http'][503]

                headers = {'referer': url,}
                self._set_cookie(self.CF_COOKIE_DOMAIN, '__cfduid', cfduid_value)
                query = hiddens
                query['jschl_answer'] = answer
                time.sleep(5)
                content, urlh = self._download_webpage_handle('https://{}/cdn-cgi/l/chk_jschl'.format(self.CF_DOMAIN),
                                                            '',
                                                            note='Answer CloudFlare challenge',
                                                            query=query,
                                                            headers=headers)
                for header, cookies in urlh.headers.items():
                    if header.lower() != 'set-cookie':
                        continue
                    key, value = cookies.split(';')[0].split('=')
                    self._set_cookie(self.CF_COOKIE_DOMAIN, key, value)
            else:
                raise ExtractorError('Failed to get parameters for CloudFlare challenge (please try again)')

        # Rest of the extractor here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.