New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Fix HTTP redirect handling #7094
[core] Fix HTTP redirect handling #7094
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with FakeYDL(...) as ydl:
...
Can be rewritten as ydl = FakeYDL(...)
, saving one indentation across the file, since we are currently only using it to setup/teardown console title.
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't compared the actual logic to the RFC - let me know if I should or you are confident enough to merge without.
While cleaning up legacy code certainly would be nice, I don't want to hold this PR up for that, so I'm not making any suggestions for it.
PS: It's difficult to see what's moved vs new code in test_networking. In future, try to do rename and code change in separate commits.
Otherwise we just leave many servers and threads running till all tests finish
causing issues with gh actions
ee280c7
to
7aeda6c
Compare
In Py3.9, this quashed the resource warnings on shutdown (not seen in 3.5 or 2.7):
def tearDown(self):
def closer(svr):
def _closer():
svr.shutdown()
svr.server_close()
return _closer
shutdown_thread = threading.Thread(target=closer(self.http_httpd))
shutdown_thread.start()
self.http_server_thread.join(2.0)
shutdown_thread = threading.Thread(target=closer(self.https_httpd))
shutdown_thread.start()
self.https_server_thread.join(2.0)
|
In ytdl-org/youtube-dl@c047270c02, yt-dl started to remove each RFC9110 s8.4
Obviously RFC9110 applies to what's sent over the network rather than the processing at either end, but it seems reasonable to apply the same semantics in those contexts, so that the header only lists the encodings that haven't yet been decoded. Work-arounds with a custom header or |
This was changed in 65e5c02. Funny enough, even urllib3/requests don't remove the Content-Encoding header, hence why I gave the go ahead. I'll have to test the geo verification/per request proxy. I completely changed it in the network rework as the implementation is flawed/buggy. |
And I actually commented on that PR too. But I hadn't yet tracked down the commit where the removal was added. Looking at the MDN page for content-encoding (tl;dr for the RFCs), the decoding logic is indeed essentially |
Reading the HTTP specs again, it seems that a header (like
So my code for this is (where the decompression methods return encodings = list(filter(txt_or_none, ', '.join(resp.headers.get_all('Content-encoding', [])).split(',')))
for enc in encodings[::-1]:
old_resp = resp
if enc in ('gzip', 'x-gzip'):
uncompressed = self.gzip_d(resp.read())
encodings.pop()
elif enc == 'deflate':
uncompressed = self.reflate(resp.read())
encodings.pop()
elif enc == 'br' and brotli:
uncompressed = self.brotli(resp.read())
encodings.pop()
else:
break
resp = compat_urllib_request.addinfourl(
uncompressed, old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg
if encodings:
# remaining encodings
resp.headers['Content-encoding'] = ' '.join(encodings)
elif 'Content-encoding' in resp.headers:
del resp.headers['Content-encoding'] |
* Thx coletdjnz: yt-dlp/yt-dlp#7094 * add test that redirected `POST` loses its `Content-Type`
* Thx coletdjnz: yt-dlp/yt-dlp#7094 * add test that redirected `POST` loses its `Content-Type`
Aligns HTTP redirect handling with what browsers commonly do and RFC standards. Fixes issues yt-dlp@afac4ca missed. Authored by: coletdjnz
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
ADD DESCRIPTION HERE
Backport from #2861
afac4ca never properly fixed the issue (and was not tested, which was part of the problem):
This PR fixes the above issues. It aligns the redirect handling with the latest recommended RFC standard/what common browsers tend to do (see #2861 for more full description of this).
I have also back-ported some of the networking tests, including the HTTP redirect test.
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at f7ce007
Summary
🧹🐛🎨
Improve redirection handling and code style in
yt_dlp/utils/_utils.py
and remove obsolete code and tests. This pull request enhances the functionality, readability, and maintainability of the codebase.Walkthrough
YoutubeDLRedirectHandler
class ([link](https://github.com/yt-dlp/yt-dlp/pull/7094/files?diff=unified&w=0#diff-feda8f56946dc048e754111850baaef0eec4b9f8bbc2d3f04b1a785626ea5c0eL1682-R1683)
,[link](https://github.com/yt-dlp/yt-dlp/pull/7094/files?diff=unified&w=0#diff-feda8f56946dc048e754111850baaef0eec4b9f8bbc2d3f04b1a785626ea5c0eL1696-R1717)
)