Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"html" error mode somewhat incompatible with URLs #139

Closed
annevk opened this issue May 8, 2018 · 6 comments
Closed

"html" error mode somewhat incompatible with URLs #139

annevk opened this issue May 8, 2018 · 6 comments

Comments

@annevk
Copy link
Member

annevk commented May 8, 2018

For some reason I only just realized that the current "html" error mode results in URLs where & (and ;) is not escaped. There's quite a few server-side libraries that split on & and ;. Maybe that means we should continue to offer an error mode specifically for URLs that in addition to outputting this sequences, also percent-encodes the & and ; as Chrome appears to be doing.

On the other hand, Firefox has shipped the currently specified behavior and we didn't get bug reports.

@annevk
Copy link
Member Author

annevk commented May 8, 2018

cc @hsivonen @inexorabletash

@inexorabletash
Copy link
Member

FYI some discussion on the Blink side here:

https://bugs.chromium.org/p/chromium/issues/detail?id=795733

https://chromium-review.googlesource.com/c/chromium/src/+/836707

There's a decent rationale for Chrome's current behavior towards the bottom of the first link. Current status: no-one is actively pushing to change either Chrome or spec/WPT here.

@hsivonen
Copy link
Member

hsivonen commented May 9, 2018

Why doesn't this belong on the URL standard side? (I.e. the URL side splitting the query string and then encoding it component-wise and percent-encoding each component in a mode that encodes & and ;?) Otherwise, wouldn't the URL layer re-encoding the % characters?

@hsivonen
Copy link
Member

hsivonen commented May 9, 2018

Otherwise, wouldn't the URL layer re-encoding the % characters?

Oh, it wouldn't.

@annevk
Copy link
Member Author

annevk commented May 9, 2018

Currently https://url.spec.whatwg.org/#query-state does not distinguish between & being part of the URL and it coming back from the encode operation.

So we could solve this in the URL Standard instead if we accept that we need to encode code point for code point (to avoid encoding a & in the input). (There's probably a more complicated scheme possible here for optimization purposes, but I'm not sure I want the standard to contain that.)

(As far as <form> goes, in Chrome this only affects application/x-www-form-urlencoded it seems. For text/plain and multipart/form-data there's no additional percent-encoding. However, it seems that per the URL Standard the appropriate escaping already happens there.)

@annevk
Copy link
Member Author

annevk commented May 9, 2018

I created whatwg/url#386 to address this. As far as I can tell Henri is indeed correct that we don't need to touch the Encoding Standard for this.

annevk added a commit to web-platform-tests/wpt that referenced this issue May 9, 2018
See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)
annevk added a commit to web-platform-tests/wpt that referenced this issue May 9, 2018
See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)
annevk added a commit to web-platform-tests/wpt that referenced this issue May 10, 2018
See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)
annevk added a commit to web-platform-tests/wpt that referenced this issue May 22, 2018
See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)
annevk added a commit to whatwg/url that referenced this issue May 23, 2018
If the input to the URL parser contains code points outside the non-UTF-8 encoding's value space and the URL parser was invoked using a non-UTF-8 encoding, then those code points end up as &#...;.

The problem is that &, #, and ; are also URL separators, but the previous algorithm would only encode #. This ensures that & and ; are also encoded, as some browsers already do, but only if they came about as the result of the encode operation.

Tests: web-platform-tests/wpt#10915.

Fixes whatwg/encoding#139.
sideshowbarker pushed a commit to web-platform-tests/wpt that referenced this issue May 23, 2018
See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)
jgraham pushed a commit to web-platform-tests/wpt that referenced this issue May 23, 2018
See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jun 5, 2018
…, a=testonly

Automatic update from web-platform-testsURL/Encoding: change query state parsing

See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to @rakuco's work on trying to align Chrome with the earlier iteration of the specification.)

--

wpt-commits: e399a2c694345240639c23cc5e9e4f077a47cf30
wpt-pr: 10915
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…, a=testonly

Automatic update from web-platform-testsURL/Encoding: change query state parsing

See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to rakuco's work on trying to align Chrome with the earlier iteration of the specification.)

--

wpt-commits: e399a2c694345240639c23cc5e9e4f077a47cf30
wpt-pr: 10915

UltraBlame original commit: 13f3705568922e770ec97af2aad3e09e0449caa6
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…, a=testonly

Automatic update from web-platform-testsURL/Encoding: change query state parsing

See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to rakuco's work on trying to align Chrome with the earlier iteration of the specification.)

--

wpt-commits: e399a2c694345240639c23cc5e9e4f077a47cf30
wpt-pr: 10915

UltraBlame original commit: 13f3705568922e770ec97af2aad3e09e0449caa6
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…, a=testonly

Automatic update from web-platform-testsURL/Encoding: change query state parsing

See whatwg/encoding#139 for rationale and whatwg/url#386 for the change to the URL Standard.

(I found all these resources in part due to rakuco's work on trying to align Chrome with the earlier iteration of the specification.)

--

wpt-commits: e399a2c694345240639c23cc5e9e4f077a47cf30
wpt-pr: 10915

UltraBlame original commit: 13f3705568922e770ec97af2aad3e09e0449caa6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants