-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Do not use percent decode on strings #3111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Firefox and Safari use "UTF-8 decode without BOM" here. It seems that Edge and Chrome do not. I tested with |
|
Note that we regressed defining what happens upon failure in c230f55. We could revert to that behavior instead, terminating the algorithm when decode returns failure, but I would prefer not using the "or fail" variant at all per above. |
|
While creating tests I discovered this only happens if the document encoding is UTF-8. It seems like fragments might be tied to the document encoding too somehow? |
|
It also seems that none of the hooks defined in https://encoding.spec.whatwg.org/#specification-hooks work here, as even @hsivonen thoughts on what our options are here and what we should do? |
|
Note that I'm not sure what browsers do here makes a whole lot of sense, given that when the URL is parsed they'll translate an input in a fragment such as U+00FF to its UTF-8 encoded equivalent. Meaning that if you have |
|
Sounds like there's still some work to do here, so tagging accordingly. Also happy if you want to split up the quick fixes (e.g. using the correct string-accepting algorithms) from the normative fixes. |
|
Per my analysis in my last comment I think it would be best if we could merge this as-is and get implementations to align. I think not using UTF-8 decode here would more likely result in breakage of legacy pages. Granted, they might have been broken for quite a while now. It would be nice to hear at least one implementation agree though before filing bugs. |
|
OK, well, at least there will be tests so we can see how bad the mismatch with implementations is, right? I'm a little confused how changing things to differ from how implementations work now could cause breakage of legacy pages, and it seems like the kind of thing we might need compat analysis for. |
See whatwg/html#3111 for context.
|
I don't know what the appropriately reasonable and Web-compatible definition is. As for what Gecko does, the |
|
After further discussion with @hsivonen on IRC it turns out Firefox also exposes a "decode without BOM handling" hook (not present in the Encoding Standard) used exclusively for this (as far as web-exposed exposure goes). It seems however that other browsers are somewhat inconsistent here. I cannot browse to an U+00FF ID using (The reason why I think that will make more legacy pages work is that the URL parser now UTF-8 encodes. It used to use the local encoding. However, decoding still uses the local encoding, which likely broke legacy pages. If they haven't been updated, switching to decoding with UTF-8 will make them work again.) |
|
@tkent-google any thoughts on this? You didn't cover this #2902 so it'd be good to know what you think of exclusively using UTF-8. |
|
It seems Google Chrome doesn't support non-UTF8 percent decoding for scroll-to-the-fragment, and I couldn't find any bug reports about non-UTF8 encoding support in our bug database. So I don't oppose exclusively using UTF-8. |
See whatwg/html#3111 for context.
|
@tkent-google it doesn't seem that Chrome just runs UTF-8 decode though and calls it a day, per my tests in web-platform-tests/wpt#8723. Is there anything else going on? |
|
I think test results indicates Chrome runs UTF-8 decode for percent encoded fragments.
|
|
@tkent-google ah, so if you cannot decode you try to do a literal match? E.g., it would end up matching |
No. Literal matching is tried before percent decoding. It matched to the current algorithm of "determine what the indicated part of the document" Correction of the previous comment:
|
|
The way that seems to work though is that anything that was already UTF-8 decoded is kept. I don't think we want to standardize on that. That is, for Falling back to https://infra.spec.whatwg.org/#isomorphic-decode when UTF-8 decode fails seems reasonable, but we should do it across the entire input string. |
|
@tkent-google would such a change be acceptable? |
It's acceptable. Other browser vendors might have different opinions. I just added counters to Chrome for mixed decoding case, isomorphic decoding case, and so on. http://crbug.com/802988 |
- url_util.* url::DecodeURLEscapeSequences() returns what encodings are applied on decoding. - KURL.* Add optional DecodeURLResult argument to blink:: DecodeURLEscapeSequences(). - web_feature.mojom, LocalFrameView.cpp, and enums.xml Add UseCounters. These UseCounters are for the discussion in whatwg/html#3111 . Bug: 802988 Change-Id: Ie171212a2ca97ee5dc0e5bb4eeb98463865e5ab3 Reviewed-on: https://chromium-review.googlesource.com/869696 Reviewed-by: Mike West <mkwst@chromium.org> Commit-Queue: Kent Tamura <tkent@chromium.org> Cr-Commit-Position: refs/heads/master@{#530004}
|
Thanks for adding the ScrollToFragment* counters to https://www.chromestatus.com/metrics/feature/timeline/popularity! I guess we'll revisit this issue in several months to see what options we have. |
Currently data/graphs in chromestatus.com are broken. Data from the internal source: ScrollToFragmentRequested: 30.78% Note:
Summary:
|
Also stop using "UTF-8 decode without BOM or fail" and instead use "UTF-8 decode without BOM" for fragment identifiers. Tests: web-platform-tests/wpt#8723.
298a597 to
a53dfde
Compare
|
Okay, I think that means we should proceed with this PR as-is. I rebased it so OP now includes links to readable previews/diffs and the commit message links to the tests. |
See whatwg/html#3111 for context.
|
@domenic do you want to review this? |
domenic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editorially LGTM. I've gotten a little lost on what the tests/browser bugs situation is here, but I trust you'll take care of it.
|
Thanks. @tkent-google, could you review web-platform-tests/wpt#8723? This will require subtle changes to all browsers. I file bugs once all the bits are reviewed. |
* Navigation fragment decode and encodings See whatwg/html#3111 for context. * make tests more usable * address review feedback
… a=testonly Automatic update from web-platform-testsNavigation fragment decode and encodings (#8723) * Navigation fragment decode and encodings See whatwg/html#3111 for context. * make tests more usable * address review feedback -- wpt-commits: 5b878a1e5de29aa4e68c48e0122878f983f036ff wpt-pr: 8723
- url_util.* url::DecodeURLEscapeSequences() returns what encodings are applied on decoding. - KURL.* Add optional DecodeURLResult argument to blink:: DecodeURLEscapeSequences(). - web_feature.mojom, LocalFrameView.cpp, and enums.xml Add UseCounters. These UseCounters are for the discussion in whatwg/html#3111 . Bug: 802988 Change-Id: Ie171212a2ca97ee5dc0e5bb4eeb98463865e5ab3 Reviewed-on: https://chromium-review.googlesource.com/869696 Reviewed-by: Mike West <mkwst@chromium.org> Commit-Queue: Kent Tamura <tkent@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#530004} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: dadd77a1df99121e213ab3f418fe0dfcbea6bf91
- url_util.* url::DecodeURLEscapeSequences() returns what encodings are applied on decoding. - KURL.* Add optional DecodeURLResult argument to blink:: DecodeURLEscapeSequences(). - web_feature.mojom, LocalFrameView.cpp, and enums.xml Add UseCounters. These UseCounters are for the discussion in whatwg/html#3111 . Bug: 802988 Change-Id: Ie171212a2ca97ee5dc0e5bb4eeb98463865e5ab3 Reviewed-on: https://chromium-review.googlesource.com/869696 Reviewed-by: Mike West <mkwst@chromium.org> Commit-Queue: Kent Tamura <tkent@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#530004} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: dadd77a1df99121e213ab3f418fe0dfcbea6bf91
… a=testonly Automatic update from web-platform-testsNavigation fragment decode and encodings (#8723) * Navigation fragment decode and encodings See whatwg/html#3111 for context. * make tests more usable * address review feedback -- wpt-commits: 5b878a1e5de29aa4e68c48e0122878f983f036ff wpt-pr: 8723 UltraBlame original commit: 455d53e15aff62c2234c19995a0e2c5f7b5127b6
… a=testonly Automatic update from web-platform-testsNavigation fragment decode and encodings (#8723) * Navigation fragment decode and encodings See whatwg/html#3111 for context. * make tests more usable * address review feedback -- wpt-commits: 5b878a1e5de29aa4e68c48e0122878f983f036ff wpt-pr: 8723 UltraBlame original commit: 455d53e15aff62c2234c19995a0e2c5f7b5127b6
… a=testonly Automatic update from web-platform-testsNavigation fragment decode and encodings (#8723) * Navigation fragment decode and encodings See whatwg/html#3111 for context. * make tests more usable * address review feedback -- wpt-commits: 5b878a1e5de29aa4e68c48e0122878f983f036ff wpt-pr: 8723 UltraBlame original commit: 455d53e15aff62c2234c19995a0e2c5f7b5127b6
… a=testonly Automatic update from web-platform-testsNavigation fragment decode and encodings (#8723) * Navigation fragment decode and encodings See whatwg/html#3111 for context. * make tests more usable * address review feedback -- wpt-commits: 5b878a1e5de29aa4e68c48e0122878f983f036ff wpt-pr: 8723
Also stop using "UTF-8 decode without BOM or fail" and instead use "UTF-8 decode without BOM" for fragment identifiers.
Tests: ...
/browsing-the-web.html ( diff )
/infrastructure.html ( diff )