New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tilde characters encoded in URLs starting in 1.25.4 #1683
Comments
Fixes urllib3#1683. Recently merged urllib3#1673 ensured that URLs are percent-encoded. This is a behavior change in that URLs with tilde characters in them would not previously have been percent-encoded, but are now. RFC 3986 says[1]: For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. This suggests that urllib3 should not escape tilde characters in URLs. The RFC describes tilde as an "unreserved" character. Among the character classes at the top of url.py, the closest match to the "unreserved" set seems to be ZONE_ID_CHARS. RFC6874 says: A <zone_id> SHOULD contain only ASCII characters classified as "unreserved" for use in URIs [RFC3986]. Which suggests that it should be safe to add to that set. [1] https://tools.ietf.org/html/rfc3986#section-2.3 [2] https://tools.ietf.org/html/rfc6874#section-2
For instance, it breaks code like the Azure Storage signing, because it's building a signature based on URL, and now the service refuses our request with message like that: E AuthenticationErrorDetail:The MAC signature found in the HTTP request '+9LrJpPDKHcsIcv+cxVVmD0147OshzlnRn+YLRgJdSE=' is not the same as any computed signature. Server used following string to sign: 'PUT
E (edited lines)
E /amqptest/utcontainer6826924cfebd44dc847eac07516e17c1/%7Ea%7Ea%7E'. Because we do the call with Just to be sure you don't get my message wrongly: maybe you do the right thing and Azure is wrong here in pure term of HTTP spec, but at least as @jeblair said that's a change of behavior in a bugfix version and definitely breaks people :( |
I'll take a look at this and prepare a release if needed once it's resolved. Thanks for the report. |
It is breaking badly for us, as the redirect download urls contains a "~" and it transform it to |
This is fixed in urllib3 1.25.6 |
Release 1.25.4 has a behavior change due to #1673 -- tilde characters in URLs are now percent-encoded. Some remote systems may not decode path components if they are not expecting any input other than unreserved characters. In such cases, this behavior change may cause user-visible errors.
The text was updated successfully, but these errors were encountered: