Skip to content

Commit

Permalink
[rh:requests] Add handler for requests HTTP library (#3668)
Browse files Browse the repository at this point in the history
Adds support for HTTPS proxies and persistent connections (keep-alive)

Closes #1890
Resolves #4070
Resolves ytdl-org/youtube-dl#32549
Resolves ytdl-org/youtube-dl#14523
Resolves ytdl-org/youtube-dl#13734

Authored by: coletdjnz, Grub4K, bashonly
  • Loading branch information
coletdjnz committed Oct 13, 2023
1 parent 700444c commit 8a8b545
Show file tree
Hide file tree
Showing 14 changed files with 619 additions and 79 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install pytest
- name: Install dependencies
run: pip install pytest -r requirements.txt
- name: Run tests
continue-on-error: False
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,14 +157,15 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* yt-dlp's sanitization of invalid characters in filenames is different/smarter than in youtube-dl. You can use `--compat-options filename-sanitization` to revert to youtube-dl's behavior
* yt-dlp tries to parse the external downloader outputs into the standard progress output if possible (Currently implemented: [~~aria2c~~](https://github.com/yt-dlp/yt-dlp/issues/5931)). You can use `--compat-options no-external-downloader-progress` to get the downloader output as-is
* yt-dlp versions between 2021.09.01 and 2023.01.02 applies `--match-filter` to nested playlists. This was an unintentional side-effect of [8f18ac](https://github.com/yt-dlp/yt-dlp/commit/8f18aca8717bb0dd49054555af8d386e5eda3a88) and is fixed in [d7b460](https://github.com/yt-dlp/yt-dlp/commit/d7b460d0e5fc710950582baed2e3fc616ed98a80). Use `--compat-options playlist-match-filter` to revert this
* yt-dlp uses modern http client backends such as `requests`. Use `--compat-options prefer-legacy-http-handler` to prefer the legacy http handler (`urllib`) to be used for standard http requests.

For ease of use, a few more compat options are available:

* `--compat-options all`: Use all compat options (Do NOT use)
* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter`
* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter`
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization,no-youtube-prefer-utc-upload-date`
* `--compat-options 2022`: Same as `--compat-options playlist-match-filter,no-external-downloader-progress`. Use this to enable all future compat options
* `--compat-options 2022`: Same as `--compat-options playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler`. Use this to enable all future compat options


# INSTALLATION
Expand Down Expand Up @@ -274,6 +275,7 @@ While all the other dependencies are optional, `ffmpeg` and `ffprobe` are highly
* [**certifi**](https://github.com/certifi/python-certifi)\* - Provides Mozilla's root certificate bundle. Licensed under [MPLv2](https://github.com/certifi/python-certifi/blob/master/LICENSE)
* [**brotli**](https://github.com/google/brotli)\* or [**brotlicffi**](https://github.com/python-hyper/brotlicffi) - [Brotli](https://en.wikipedia.org/wiki/Brotli) content encoding support. Both licensed under MIT <sup>[1](https://github.com/google/brotli/blob/master/LICENSE) [2](https://github.com/python-hyper/brotlicffi/blob/master/LICENSE) </sup>
* [**websockets**](https://github.com/aaugustin/websockets)\* - For downloading over websocket. Licensed under [BSD-3-Clause](https://github.com/aaugustin/websockets/blob/main/LICENSE)
* [**requests**](https://github.com/psf/requests)\* - HTTP library. For HTTPS proxy and persistent connections support. Licensed under [Apache-2.0](https://github.com/psf/requests/blob/main/LICENSE)

### Metadata

Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ websockets
brotli; platform_python_implementation=='CPython'
brotlicffi; platform_python_implementation!='CPython'
certifi
requests>=2.31.0,<3
urllib3>=1.26.17,<3
9 changes: 8 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,14 @@ def py2exe_params():
'compressed': 1,
'optimize': 2,
'dist_dir': './dist',
'excludes': ['Crypto', 'Cryptodome'], # py2exe cannot import Crypto
'excludes': [
# py2exe cannot import Crypto
'Crypto',
'Cryptodome',
# py2exe appears to confuse this with our socks library.
# We don't use pysocks and urllib3.contrib.socks would fail to import if tried.
'urllib3.contrib.socks'
],
'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'],
# Modules that are only imported dynamically must be added here
'includes': ['yt_dlp.compat._legacy', 'yt_dlp.compat._deprecated',
Expand Down
168 changes: 134 additions & 34 deletions test/test_networking.py

Large diffs are not rendered by default.

36 changes: 18 additions & 18 deletions test/test_socks.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,15 +263,15 @@ def ctx(request):


class TestSocks4Proxy:
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks4_no_auth(self, handler, ctx):
with handler() as rh:
with ctx.socks_server(Socks4ProxyHandler) as server_address:
response = ctx.socks_info_request(
rh, proxies={'all': f'socks4://{server_address}'})
assert response['version'] == 4

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks4_auth(self, handler, ctx):
with handler() as rh:
with ctx.socks_server(Socks4ProxyHandler, user_id='user') as server_address:
Expand All @@ -281,15 +281,15 @@ def test_socks4_auth(self, handler, ctx):
rh, proxies={'all': f'socks4://user:@{server_address}'})
assert response['version'] == 4

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks4a_ipv4_target(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler) as server_address:
with handler(proxies={'all': f'socks4a://{server_address}'}) as rh:
response = ctx.socks_info_request(rh, target_domain='127.0.0.1')
assert response['version'] == 4
assert (response['ipv4_address'] == '127.0.0.1') != (response['domain_address'] == '127.0.0.1')

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks4a_domain_target(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler) as server_address:
with handler(proxies={'all': f'socks4a://{server_address}'}) as rh:
Expand All @@ -298,7 +298,7 @@ def test_socks4a_domain_target(self, handler, ctx):
assert response['ipv4_address'] is None
assert response['domain_address'] == 'localhost'

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_ipv4_client_source_address(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler) as server_address:
source_address = f'127.0.0.{random.randint(5, 255)}'
Expand All @@ -308,7 +308,7 @@ def test_ipv4_client_source_address(self, handler, ctx):
assert response['client_address'][0] == source_address
assert response['version'] == 4

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
@pytest.mark.parametrize('reply_code', [
Socks4CD.REQUEST_REJECTED_OR_FAILED,
Socks4CD.REQUEST_REJECTED_CANNOT_CONNECT_TO_IDENTD,
Expand All @@ -320,7 +320,7 @@ def test_socks4_errors(self, handler, ctx, reply_code):
with pytest.raises(ProxyError):
ctx.socks_info_request(rh)

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_ipv6_socks4_proxy(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler, bind_ip='::1') as server_address:
with handler(proxies={'all': f'socks4://{server_address}'}) as rh:
Expand All @@ -329,7 +329,7 @@ def test_ipv6_socks4_proxy(self, handler, ctx):
assert response['ipv4_address'] == '127.0.0.1'
assert response['version'] == 4

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_timeout(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler, sleep=2) as server_address:
with handler(proxies={'all': f'socks4://{server_address}'}, timeout=0.5) as rh:
Expand All @@ -339,15 +339,15 @@ def test_timeout(self, handler, ctx):

class TestSocks5Proxy:

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5_no_auth(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
response = ctx.socks_info_request(rh)
assert response['auth_methods'] == [0x0]
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5_user_pass(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler, auth=('test', 'testpass')) as server_address:
with handler() as rh:
Expand All @@ -360,23 +360,23 @@ def test_socks5_user_pass(self, handler, ctx):
assert response['auth_methods'] == [Socks5Auth.AUTH_NONE, Socks5Auth.AUTH_USER_PASS]
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5_ipv4_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
response = ctx.socks_info_request(rh, target_domain='127.0.0.1')
assert response['ipv4_address'] == '127.0.0.1'
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5_domain_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
response = ctx.socks_info_request(rh, target_domain='localhost')
assert (response['ipv4_address'] == '127.0.0.1') != (response['ipv6_address'] == '::1')
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5h_domain_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5h://{server_address}'}) as rh:
Expand All @@ -385,7 +385,7 @@ def test_socks5h_domain_target(self, handler, ctx):
assert response['domain_address'] == 'localhost'
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5h_ip_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5h://{server_address}'}) as rh:
Expand All @@ -394,15 +394,15 @@ def test_socks5h_ip_target(self, handler, ctx):
assert response['domain_address'] is None
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_socks5_ipv6_destination(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
response = ctx.socks_info_request(rh, target_domain='[::1]')
assert response['ipv6_address'] == '::1'
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_ipv6_socks5_proxy(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler, bind_ip='::1') as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
Expand All @@ -413,7 +413,7 @@ def test_ipv6_socks5_proxy(self, handler, ctx):

# XXX: is there any feasible way of testing IPv6 source addresses?
# Same would go for non-proxy source_address test...
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
def test_ipv4_client_source_address(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
source_address = f'127.0.0.{random.randint(5, 255)}'
Expand All @@ -422,7 +422,7 @@ def test_ipv4_client_source_address(self, handler, ctx):
assert response['client_address'][0] == source_address
assert response['version'] == 5

@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http')], indirect=True)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http')], indirect=True)
@pytest.mark.parametrize('reply_code', [
Socks5Reply.GENERAL_FAILURE,
Socks5Reply.CONNECTION_NOT_ALLOWED,
Expand Down
7 changes: 6 additions & 1 deletion yt_dlp/YoutubeDL.py
Original file line number Diff line number Diff line change
Expand Up @@ -3968,7 +3968,7 @@ def get_encoding(stream):
})) or 'none'))

write_debug(f'Proxy map: {self.proxies}')
# write_debug(f'Request Handlers: {", ".join(rh.RH_NAME for rh in self._request_director.handlers.values())}')
write_debug(f'Request Handlers: {", ".join(rh.RH_NAME for rh in self._request_director.handlers.values())}')
for plugin_type, plugins in {'Extractor': plugin_ies, 'Post-Processor': plugin_pps}.items():
display_list = ['%s%s' % (
klass.__name__, '' if klass.__name__ == name else f' as {name}')
Expand Down Expand Up @@ -4057,6 +4057,9 @@ def urlopen(self, req):
raise RequestError(
'file:// URLs are disabled by default in yt-dlp for security reasons. '
'Use --enable-file-urls to enable at your own risk.', cause=ue) from ue
if 'unsupported proxy type: "https"' in ue.msg.lower():
raise RequestError(
'To use an HTTPS proxy for this request, one of the following dependencies needs to be installed: requests')
raise
except SSLError as e:
if 'UNSAFE_LEGACY_RENEGOTIATION_DISABLED' in str(e):
Expand Down Expand Up @@ -4099,6 +4102,8 @@ def build_request_director(self, handlers, preferences=None):
}),
))
director.preferences.update(preferences or [])
if 'prefer-legacy-http-handler' in self.params['compat_opts']:
director.preferences.add(lambda rh, _: 500 if rh.RH_KEY == 'Urllib' else 0)
return director

def encode(self, s):
Expand Down
4 changes: 3 additions & 1 deletion yt_dlp/__pyinstaller/hook-yt_dlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ def get_hidden_imports():
yield from ('yt_dlp.compat._legacy', 'yt_dlp.compat._deprecated')
yield from ('yt_dlp.utils._legacy', 'yt_dlp.utils._deprecated')
yield pycryptodome_module()
yield from collect_submodules('websockets')
# Only `websockets` is required, others are collected just in case
for module in ('websockets', 'requests', 'urllib3'):
yield from collect_submodules(module)
# These are auto-detected, but explicitly add them just in case
yield from ('mutagen', 'brotli', 'certifi')

Expand Down
9 changes: 9 additions & 0 deletions yt_dlp/dependencies/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,15 @@
# See https://github.com/yt-dlp/yt-dlp/issues/2633
websockets = None

try:
import urllib3
except ImportError:
urllib3 = None

try:
import requests
except ImportError:
requests = None

try:
import xattr # xattr or pyxattr
Expand Down
10 changes: 10 additions & 0 deletions yt_dlp/networking/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# flake8: noqa: F401
import warnings

from .common import (
HEADRequest,
PUTRequest,
Expand All @@ -11,3 +13,11 @@
# isort: split
# TODO: all request handlers should be safely imported
from . import _urllib
from ..utils import bug_reports_message

try:
from . import _requests
except ImportError:
pass
except Exception as e:
warnings.warn(f'Failed to import "requests" request handler: {e}' + bug_reports_message())
20 changes: 19 additions & 1 deletion yt_dlp/networking/_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from .exceptions import RequestError, UnsupportedRequest
from ..dependencies import certifi
from ..socks import ProxyType
from ..socks import ProxyType, sockssocket
from ..utils import format_field, traverse_obj

if typing.TYPE_CHECKING:
Expand Down Expand Up @@ -224,6 +224,24 @@ def _socket_connect(ip_addr, timeout, source_address):
raise


def create_socks_proxy_socket(dest_addr, proxy_args, proxy_ip_addr, timeout, source_address):
af, socktype, proto, canonname, sa = proxy_ip_addr
sock = sockssocket(af, socktype, proto)
try:
connect_proxy_args = proxy_args.copy()
connect_proxy_args.update({'addr': sa[0], 'port': sa[1]})
sock.setproxy(**connect_proxy_args)
if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT: # noqa: E721
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(dest_addr)
return sock
except socket.error:
sock.close()
raise


def create_connection(
address,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
Expand Down
Loading

0 comments on commit 8a8b545

Please sign in to comment.