Tilde characters encoded in URLs starting in 1.25.4 #1683

jeblair · 2019-09-19T22:28:12Z

Release 1.25.4 has a behavior change due to #1673 -- tilde characters in URLs are now percent-encoded. Some remote systems may not decode path components if they are not expecting any input other than unreserved characters. In such cases, this behavior change may cause user-visible errors.

Fixes urllib3#1683. Recently merged urllib3#1673 ensured that URLs are percent-encoded. This is a behavior change in that URLs with tilde characters in them would not previously have been percent-encoded, but are now. RFC 3986 says[1]: For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. This suggests that urllib3 should not escape tilde characters in URLs. The RFC describes tilde as an "unreserved" character. Among the character classes at the top of url.py, the closest match to the "unreserved" set seems to be ZONE_ID_CHARS. RFC6874 says: A <zone_id> SHOULD contain only ASCII characters classified as "unreserved" for use in URIs [RFC3986]. Which suggests that it should be safe to add to that set. [1] https://tools.ietf.org/html/rfc3986#section-2.3 [2] https://tools.ietf.org/html/rfc6874#section-2

lmazuel · 2019-09-23T19:18:29Z

For instance, it breaks code like the Azure Storage signing, because it's building a signature based on URL, and now the service refuses our request with message like that:

E       AuthenticationErrorDetail:The MAC signature found in the HTTP request '+9LrJpPDKHcsIcv+cxVVmD0147OshzlnRn+YLRgJdSE=' is not the same as any computed signature. Server used following string to sign: 'PUT
E (edited lines)
E       /amqptest/utcontainer6826924cfebd44dc847eac07516e17c1/%7Ea%7Ea%7E'.

Because we do the call with /amqptest/utcontainer6826924cfebd44dc847eac07516e17c1/~a~a~, so we encode our signature with the ~ string, and the service actually receives a %7E string.

Just to be sure you don't get my message wrongly: maybe you do the right thing and Azure is wrong here in pure term of HTTP spec, but at least as @jeblair said that's a change of behavior in a bugfix version and definitely breaks people :(

sethmlarson · 2019-09-23T20:27:07Z

I'll take a look at this and prepare a release if needed once it's resolved. Thanks for the report.

lasote · 2019-09-24T07:07:50Z

It is breaking badly for us, as the redirect download urls contains a "~" and it transform it to %7E. All the downloads from conan-center repository are now broken.

conan-io/conan#5780

sethmlarson · 2019-09-24T14:35:16Z

This is fixed in urllib3 1.25.6

…he eventually urllib3/urllib3#1683 (comment)

jeblair mentioned this issue Sep 19, 2019

Add tilde to ZONE_ID_CHARS #1684

Closed

rakshith91 mentioned this issue Sep 23, 2019

[Storage Blob] Request breaks when blob_name has a '~' character. Azure/azure-sdk-for-python#7407

Closed

sethmlarson self-assigned this Sep 23, 2019

This was referenced Sep 24, 2019

"Please log in to "conan-center" to perform this action." for several packages conan-io/conan#5780

Closed

Release 1.18.5 conan-io/conan#5801

Closed

Consider modify restriction of urllib3 at requirements.txt conan-io/conan#5802

Closed

minrk mentioned this issue Sep 24, 2019

blacklist urllib3 versions with encoding bug jupyterhub/jupyterhub#2743

Merged

sethmlarson mentioned this issue Sep 24, 2019

Don't percent encode tilde characters in URL #1692

Merged

sethmlarson closed this as completed in #1692 Sep 24, 2019

crizCraig added a commit to deepdrive/deepdrive-sim that referenced this issue Oct 2, 2019

This fixes the dependency conflict earlier, but need to clear pip cac…

8157bfc

…he eventually urllib3/urllib3#1683 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tilde characters encoded in URLs starting in 1.25.4 #1683

Tilde characters encoded in URLs starting in 1.25.4 #1683

jeblair commented Sep 19, 2019

lmazuel commented Sep 23, 2019 •

edited

sethmlarson commented Sep 23, 2019

lasote commented Sep 24, 2019

sethmlarson commented Sep 24, 2019

Tilde characters encoded in URLs starting in 1.25.4 #1683

Tilde characters encoded in URLs starting in 1.25.4 #1683

Comments

jeblair commented Sep 19, 2019

lmazuel commented Sep 23, 2019 • edited

sethmlarson commented Sep 23, 2019

lasote commented Sep 24, 2019

sethmlarson commented Sep 24, 2019

lmazuel commented Sep 23, 2019 •

edited