-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[utils] Don't trust Content-length when Content-encoding is present #6176
Conversation
Our YoutubeDLHandler is doing auto-decompression, so that we cannot compare Content-length to the size of extracted data. Although HttpFD has specified Youtubedl-no-compression, the web server may still ignore this and return deflated data (as the situation in yt-dlp#3772). This change attempts to retain the Content-encoding header for later comparison in HttpFD so that we could discard data_len when the web server returns compressed data. I didn't find the reason why this header was previously removed but didn't yet find a counter example either. Fixes yt-dlp#3772 in my experiments.
620d2ea
to
d1ea465
Compare
cc @coletdjnz |
note for self: check against urllib3/requests too |
yt_dlp/downloader/http.py
Outdated
@@ -213,6 +213,11 @@ def close_stream(): | |||
def download(): | |||
data_len = ctx.data.info().get('Content-length', None) | |||
|
|||
if ctx.data.info().get('Content-encoding', None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
is default?
if ctx.data.info().get('Content-encoding', None): | |
if ctx.data.info().get('Content-encoding'): |
Or maybe
if ctx.data.info().get('Content-encoding', None): | |
if ctx.data.info().get('Content-encoding') is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm just trying to keep the same coding style as data_len is also adding this , None
. I'll try to remove both then.
When I worked on this code I inferred that it was trying to handle nested encodings, or perhaps more realistically to avoid decoding an already decoded contents. The old yt-dl code was like this:
|
I see. An alternative approach would be moving the info into a custom header like "Youtubedl-decompressed" and check that in HttpFD instead. |
urllib3/requests appear to retain the content-encoding header after decompression too, so this should be fine in that regard (needed to check due to #2861 and #3668).
Shouldn't be an issue as the data should only be tried to be decoded once (i.e. content-encoding read once for whole decoding process, including multi decoding). I don't see us needing anyway of checking if the content is decoded or not either (and in the case it is in an unsupported encoding, we can catch that elsewhere), so including the header even when decoded should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Networking changes look fine to me.
#3772 seems to work for me too now assuming you don't use --test
(@pukkandan not sure if that is a concern)
Let's not worry about that for now Make sure to port this to the networking PR too |
Authored by: felixonmars Closes yt-dlp#3772, yt-dlp#6178
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
Our YoutubeDLHandler is doing auto-decompression, so that we cannot compare Content-length to the size of extracted data. Although HttpFD has specified Youtubedl-no-compression, the web server may still ignore this and return deflated data (as the situation in #3772).
This change attempts to retain the Content-encoding header for later comparison in HttpFD so that we could discard data_len when the web server returns compressed data. I didn't find the reason why this header was previously removed but didn't yet find a counter example either.
Fixes #3772 in my experiments.
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?