Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate schemeless URLs? #2920

Closed
kenballus opened this issue Feb 14, 2023 · 8 comments
Closed

Deprecate schemeless URLs? #2920

kenballus opened this issue Feb 14, 2023 · 8 comments
Labels
💰 Bounty $300 If you complete this issue we'll pay you $300 on OpenCollective!

Comments

@kenballus
Copy link
Contributor

Context

Schemeless URLs can cause a lot of parsing confusion because they aren't standardized in the RFC. Examples like the following are especially weird, and could cause problems when interoperability with other parsers matters:

"evil.com://good.com"

Here are some popular URL parsers' interpretations:
urllib3:

Scheme:   (nil)
Userinfo: (nil)
Host:     evil.com
Port:     (nil)
Path:     //good.com
Query:    (nil)
Fragment: (nil)

cpython urllib:

Scheme:   evil.com
Userinfo: (nil)
Host:     good.com
Port:     (nil)
Path:     (nil)
Query:    (nil)
Fragment: (nil)

furl:

Scheme:   evil.com
Userinfo: (nil)
Host:     good.com
Port:     (nil)
Path:     (nil)
Query:    (nil)
Fragment: (nil)

hyperlink:

Scheme:   evil.com
Userinfo: (nil)
Host:     good.com
Port:     (nil)
Path:     /
Query:    (nil)
Fragment: (nil)

rfc3986:

Scheme:   evil.com
Userinfo: (nil)
Host:     good.com
Port:     (nil)
Path:     (nil)
Query:    (nil)
Fragment: (nil)

yarl:

Scheme:   evil.com
Userinfo: (nil)
Host:     good.com
Port:     (nil)
Path:     /
Query:    (nil)
Fragment: (nil)

As you can see, we're the outlier here.

In my opinion, this is something worth fixing, but I imagine that schemeless URLs are in pretty widespread use with urllib3. Thus, we might consider adding a DeprecationWarning that encourages people to explicitly state their schemes.

@sethmlarson
Copy link
Member

Totally agree that this is something worth fixing, thank you for doing all this research and opening this. My hope is that schemeless URLs aren't as widespread as we think and that we've been supporting this mostly for backwards compatibility and not because a large swath of the ecosystem relies on it.

I think adding a DeprecationWarning as you mentioned that links to this issue (we can edit the issue body to provide actionable feedback for users) and says that a removal will occur in a future version. We can then get that released in 1.26.x and 2.x to start warning users and gathering feedback if necessary.

@sigmavirus24
Copy link
Contributor

In my experience these often occur in the Location header and require resolution per the specifications. I don't remember how urllib3 does redirect handling but this may be required to support that

@DarkPhily
Copy link

Hi,
I'm interested in contributing.
I would like to give this issue a try, but I'm not sure what should be accomplished at the end?
Raise an DeprecationWarning?
Change the parsing to be more inline with other libs?

Some guidance would be much appreciated.

@Mr-Sunglasses
Copy link
Contributor

Hey @kenballus, @sethmlarson Is this issue available for work on?

@JohnJamesUtley
Copy link

@kenballus and I are working on this right now

@JohnJamesUtley
Copy link

JohnJamesUtley commented May 18, 2023

I removed support for schemeless URLs in a branch from my fork:
https://github.com/JohnJamesUtley/urllib3/tree/remove_schemeless_urls

Let me know if you want me to make a pull request

Edit: Based off of and checked that it conforms with RFC 3986 specifications

@sethmlarson
Copy link
Member

Closed in #2950

@sethmlarson
Copy link
Member

Thanks @Ousret! I've approved your expense in OpenCollective for $300

wimglenn added a commit to wimglenn/pook that referenced this issue Jul 8, 2023
…thout a scheme (ie 'https://') are deprecated and will raise an error in a future version of urllib3. To avoid this DeprecationWarning ensure all URLs start with 'https://' or 'http://'. Read more in this issue: urllib3/urllib3#2920
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💰 Bounty $300 If you complete this issue we'll pay you $300 on OpenCollective!
Projects
None yet
Development

No branches or pull requests

6 participants