Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tilde characters encoded in URLs starting in 1.25.4 #1683

Closed
jeblair opened this issue Sep 19, 2019 · 4 comments · Fixed by #1692
Closed

Tilde characters encoded in URLs starting in 1.25.4 #1683

jeblair opened this issue Sep 19, 2019 · 4 comments · Fixed by #1692
Assignees

Comments

@jeblair
Copy link

jeblair commented Sep 19, 2019

Release 1.25.4 has a behavior change due to #1673 -- tilde characters in URLs are now percent-encoded. Some remote systems may not decode path components if they are not expecting any input other than unreserved characters. In such cases, this behavior change may cause user-visible errors.

jeblair pushed a commit to jeblair/urllib3 that referenced this issue Sep 19, 2019
Fixes urllib3#1683.

Recently merged urllib3#1673 ensured that URLs are percent-encoded.
This is a behavior change in that URLs with tilde characters in
them would not previously have been percent-encoded, but are now.

RFC 3986 says[1]:

   For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.

This suggests that urllib3 should not escape tilde characters
in URLs.  The RFC describes tilde as an "unreserved" character.
Among the character classes at the top of url.py, the closest
match to the "unreserved" set seems to be ZONE_ID_CHARS.  RFC6874
says:

    A <zone_id> SHOULD contain only ASCII characters classified as
    "unreserved" for use in URIs [RFC3986].

Which suggests that it should be safe to add to that set.

[1] https://tools.ietf.org/html/rfc3986#section-2.3
[2] https://tools.ietf.org/html/rfc6874#section-2
@lmazuel
Copy link

lmazuel commented Sep 23, 2019

For instance, it breaks code like the Azure Storage signing, because it's building a signature based on URL, and now the service refuses our request with message like that:

E       AuthenticationErrorDetail:The MAC signature found in the HTTP request '+9LrJpPDKHcsIcv+cxVVmD0147OshzlnRn+YLRgJdSE=' is not the same as any computed signature. Server used following string to sign: 'PUT
E (edited lines)
E       /amqptest/utcontainer6826924cfebd44dc847eac07516e17c1/%7Ea%7Ea%7E'.

Because we do the call with /amqptest/utcontainer6826924cfebd44dc847eac07516e17c1/~a~a~, so we encode our signature with the ~ string, and the service actually receives a %7E string.

Just to be sure you don't get my message wrongly: maybe you do the right thing and Azure is wrong here in pure term of HTTP spec, but at least as @jeblair said that's a change of behavior in a bugfix version and definitely breaks people :(

@sethmlarson
Copy link
Member

I'll take a look at this and prepare a release if needed once it's resolved. Thanks for the report.

@lasote
Copy link

lasote commented Sep 24, 2019

It is breaking badly for us, as the redirect download urls contains a "~" and it transform it to %7E. All the downloads from conan-center repository are now broken.

conan-io/conan#5780

@sethmlarson
Copy link
Member

This is fixed in urllib3 1.25.6

crizCraig added a commit to deepdrive/deepdrive-sim that referenced this issue Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants