Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Solving the CF issue - once and for all #13089
Comments
|
Hi i've made simple python script to resolve this challenge, use/change freely but please mention my name in commit. https://github.com/oczkers/pycfl/blob/master/pycfl.py >>> _cf('+((!+[]+!![]+[])+(+!![]))')
1+1+0
1
'21'
>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]))')
1+1+1+0
1+1
'32'
>>> _cf('!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]')
1+1+1+1+1+1+1+1
'8'
>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+0
1+1+1+1+1+1+1
'37'
>>> _cf('+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+1+0
1+1+1+1+1+1+1
'47'
>>> _cf('!+[]+!![]+!![]+!![]+!![]+!![]')
1+1+1+1+1+1
'6'
>>> _cf('+((+!![]+[])+(!+[]+!![]+!![]))')
1+0
1+1+1
'13'
>>> _cf('+((!+[]+!![]+!![]+[])+(+[]))')
1+1+1+0
0
'30'
>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+0
1+1+1+1+1+1
'36'
>>> _cf('+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]))')
1+1+0
1+1+1+1+1
'25'
>>> _cf('+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))')
1+1+1+0
1+1+1+1+1+1+1+1
'38' |
|
I made it work, but this code requires launching Node to evaluate the JS. And yes you really have to wait 5 seconds before sending the answer, so using This code is not perfect. Sometimes the regex fails, but works most of the time. Change CF_DOMAIN and CF_COOKIE_DOMAIN, set up # coding: utf-8
from __future__ import unicode_literals
from collections import OrderedDict
from datetime import datetime
import re
import subprocess as sp
import time
from .common import InfoExtractor
from ..utils import ExtractorError, int_or_none
class MyIE(InfoExtractor):
CF_DOMAIN = 'domain.name'
CF_COOKIE_DOMAIN = '.domain.name'
def set_downloader(self, downloader):
self._downloader = downloader
if downloader:
# Add a 503 handler so we get the response
class Handle503:
def http_error_503(self, request, response, code, msg, hdrs):
return response
self._downloader._opener.handle_error['http'][503] = [Handle503()]
def _real_extract(self, url):
# CloudFlare crap
add_length = str(len(self.CF_DOMAIN))
note = ('Downloading CloudFlare page (waiting 5 seconds to answer '
'challenge)')
webpage, urlh = self._download_webpage_handle('https://{}'.format(self.CF_DOMAIN),
'',
note=note)
if urlh.getcode() == 503:
cfduid_value = urlh.headers.getheader('set-cookie').split(';')[0].split('=')[1]
hiddens = self._form_hidden_inputs('challenge-form', webpage)
obj = add = var_name = None
for line in webpage.split('\n'):
line = line.strip()
if 's,t,o,p,b,r,e,a,k,i,n,g' in line:
match = re.search(r',([^=]+)=({[^\}]+\})', line)
var_name = match.group(1).split(',')[-1].strip()
obj = match.group(2)
if obj and '+=' in line and '![' in line:
add = line.replace(' t.length', add_length).replace('a.value =', 'x =')[1:]
add = add.replace('\n', '')
add = add.replace('.action += location.hash;', '')
break
if obj and add and var_name:
js = '{}={};{};console.log(x)\n'.format(var_name, obj, add)
p = sp.Popen(['node'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
answer = p.communicate(js)[0].strip()
if not answer:
raise ExtractorError('No answer (please try again)! JS: {}'.format(js))
# Remove the 503 handler because this should work now
del self._downloader._opener.handle_error['http'][503]
headers = {'referer': url,}
self._set_cookie(self.CF_COOKIE_DOMAIN, '__cfduid', cfduid_value)
query = hiddens
query['jschl_answer'] = answer
time.sleep(5)
content, urlh = self._download_webpage_handle('https://{}/cdn-cgi/l/chk_jschl'.format(self.CF_DOMAIN),
'',
note='Answer CloudFlare challenge',
query=query,
headers=headers)
for header, cookies in urlh.headers.items():
if header.lower() != 'set-cookie':
continue
key, value = cookies.split(';')[0].split('=')
self._set_cookie(self.CF_COOKIE_DOMAIN, key, value)
else:
raise ExtractorError('Failed to get parameters for CloudFlare challenge (please try again)')
# Rest of the extractor here |
As some of you may know, CF has a mode called "I'm under attack" which shows a JS challenge.
A lot of sites are using this to prevent DDoS attacks, but it's accidentally also stopping youtube-dl since there's no function that solves the JS challenge.
It's caused me and other users a lot of annoyances (see: #11572) since we're having to visit the page manually, copy the cookies from the browser to a text file and specify the location of the cookies and the exact user-agent in youtube-dl. The cookie stops working if your IP changes (if you have a dynamic IP like mine), if the user agent changes (common when the browser is updated) or when the auth expires.
Now there is a solution written in Python called cfscrape (https://github.com/Anorov/cloudflare-scrape/blob/master/cfscrape/__init__.py), but it seems to be using node.js to solve the CF challenge which I think is overkill since the JS challenge can be parsed and solved manually.
So instead of having youtube-dl return a 503 error, it would:
Construct the URL
https://website-name.tld/cdn-cgi/l/chk_jschl?jschl_vc=2zd281b23ba38136ffa257335eccbbeb&pass=1493771132.59-OPc4Nn0Sh+&jschl_answer=%sNow only the
jschl_answeris needed, which involves going through some obfuscated code:!![] + !![] = 2)I don't know Python otherwise I'd have made a shot writing a solution for it.
A very good way to see the solution or experiment with the code is to run it in the console.
This way it doesn't solve a challenge for every single request
I'm not an expert coder, but my idea was to add this "solver" to the HTTP(s) download libraries and have it check the source when a 503 is sent.
This way it would be able to solve challenges from every single site without having to edit all the extractor files.