New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[networking] Remove dot segments during URL normalization #7662
Conversation
Does this handle the problem URL |
Yes it works (though I think the domain is dead), and pretty sure it is not against standard (at least RFC3986 which browsers follow for this I believe): RFC 3986 5.2.4 Step A:
Step C is similar for |
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
* move processing to YoutubeDLHandler * also process `Location` header for redirect * use tests from yt-dlp/yt-dlp#7662
* move processing to YoutubeDLHandler * also process `Location` header for redirect * use tests from yt-dlp/yt-dlp#7662
* move processing to YoutubeDLHandler * also process `Location` header for redirect * use tests from yt-dlp/yt-dlp#7662
* move processing to YoutubeDLHandler * also process `Location` header for redirect * use tests from yt-dlp/yt-dlp#7662
* move processing to YoutubeDLHandler * also process `Location` header for redirect * use tests from yt-dlp/yt-dlp#7662
* move processing to YoutubeDLHandler * also process `Location` header for redirect * use tests from yt-dlp/yt-dlp#7662
This implements RFC3986 5.2.4 remove_dot_segments during the URL normalization process. Closes yt-dlp#3355, yt-dlp#6526 Authored by: coletdjnz
This implements RFC3986 5.2.4
remove_dot_segments
during the URL normalization process, particularly for the urllib handler.Closes #3355, #6526
This is adapted from the remove_dot_segments pseudo-code in the RFC and some inspiration from urllib3/rfc396 libraries (though it came out very close to them).
I have also renamed
escape_url
tonormalize_url
to better represent what it is doing, and moved these functions toutils.networking
.Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
馃 Generated by Copilot at 258810c
Summary
馃殮馃敡馃И
This pull request refactors and improves the URL handling functions in yt-dlp. It introduces a new function
normalize_url
that removes dot segments and escapes non-ASCII characters in a URL, and uses it in various modules. It also adds a new test case for the dot segment removal algorithm.Walkthrough
yt_dlp/utils/networking.py
and implement the RFC 3986 5.2.4 algorithm for removing dot segments from a path (link, link, link, link, link, link, link, link, link, link, link, link, link, link, link, link)/redirect_dotsegments
in theHTTPTestRequestHandler
class, which sends a 301 redirect response with a location header that contains dot segments (link)remove_dot_segments
andnormalize_url
functions, which uses theTestHTTPRequestHandler
class and thevalidate_and_send
function to send two requests: one to the path/a/b/./../../headers
and one to the path/redirect_dotsegments
. The test case asserts that both requests result in a 200 status and a final URL of/headers
(link)