Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding URL string in URL #2618

Closed
mkykadir opened this issue Oct 8, 2021 · 2 comments 路 Fixed by #2624
Closed

Encoding URL string in URL #2618

mkykadir opened this issue Oct 8, 2021 · 2 comments 路 Fixed by #2624

Comments

@mkykadir
Copy link
Contributor

mkykadir commented Oct 8, 2021

Constructing URL from plain URL string, like http://example.com/hello-馃實 or http://example.com/merhaba-d眉nya, throws an exception with message Invalid character in internet path..

Currently the constructor expects percent encoded URL strings since InetPath validation expects percent encoded URL strings, which is consistent implementation within InetPath. But there is also Segment2 implementation in GenericPath which can encode segments in path of the URL, so URL constructor can provide InetPath with encoded segments.

Developers can still construct an URL by constructing InetPath from percent encoded path segments Segment2. But this can be overwhelming for developers. I think default expectation of the constructor should be plain URL for improving ease of use and developer experience.

@mkykadir
Copy link
Contributor Author

There are two ways we can percent encode the plain URL from user input.
First one is represented in one of the unittests. Since PosixPath only checks for \0 character during its validation, it passes all Unicode characters. Casting PosixPath to InetPath results in reconstruction of the path by Segment2 (thus percent encoding happens). But this method misses percent encoding of query and anchor parts of the URL.
Second method would be use of std.uris encode method which can percent encode path, query and anchor in one go.

Which would be the preferred method @s-ludwig ?

@s-ludwig
Copy link
Member

Since paths in URLs can contain a slash as part of a path segment, we probably shouldn't do this in the canonical URL parser (e.g. /packages/foo%2Fbar/settings is different than /packages/foo/bar/settings) - in general I'd like to keep the basic parser as close to the actual URL (schema) specification as possible. But @Geod24 has proposed parsing URLs as defined in https://url.spec.whatwg.org/. This is the way browsers parse URLs within the address bar - among other things, this also handles additional percent encoding, as well as Puny code encoding for the host name.

I'd propose to add this as a prominent API function in addition to the usual URL constructor. The goal should be to be compliant with the WHATWG specification, but we can go there piece by piece.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants