Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

navigator.registerProtocolHandler specification divergent from both implementations #3377

Closed
bsittler opened this issue Jan 19, 2018 · 21 comments · Fixed by #5524
Closed

navigator.registerProtocolHandler specification divergent from both implementations #3377

bsittler opened this issue Jan 19, 2018 · 21 comments · Fixed by #5524
Labels
interop Implementations are not interoperable with each other topic: custom protocols

Comments

@bsittler
Copy link

For navigator.registerProtocolHandler the standard says:

To get the escaped version of the absolute URL of the content in question, the user agent must replace every character in that absolute URL that is not a character in the URL default encode set with the result of UTF-8 percent encoding that character.

However this does not seem to agree with either of the extant implementations, both of which %-escape % itself in the resulting URL (to %25) and also vary in other particulars (for instance, it seems + is escaped, presumably to avoid its conversion to space in handlers expecting query strings.)

@domenic
Copy link
Member

domenic commented Jan 19, 2018

Heh, @mgiuca just noted this over in whatwg/url#369 this morning. I guess it's good to have a tracking issue here too. Looking forward to the solution.

@bsittler
Copy link
Author

bsittler commented Jan 19, 2018

There is an in-progress change that would align Chrome more closely with the current standard that likely won't land (at least in its current form) due to potential for breaking existing usage: https://crrev.com/c/835908

@mgiuca
Copy link
Contributor

mgiuca commented Jan 21, 2018

Interesting. I was planning to file this exact issue as a follow-up to whatwg/url#369.

In my opinion, the current implementation (in both Chrome and Firefox) is correct, and the spec needs to be updated. See my recommendation in the URL spec bug. Instead of updating HTML, I would prefer to update the URL spec to specify "default encode set" properly, and then HTML will automatically be correct.

@annevk
Copy link
Member

annevk commented Jan 22, 2018

See also #481. Might be nice to tackle that too when fixing this.

@bsittler
Copy link
Author

Chromium issue tracker which includes some preliminary interop information from Firefox (Gecko/Necko?) and Blink: https://crbug.com/795919

If there are other extant implementations of this API I am unaware of them, so please enlighten me :)

@annevk
Copy link
Member

annevk commented May 7, 2020

<a href>test</a>
<script>
result = []
for (i = 1; i < 0x82; i++) { // start at 1 for Firefox, see below
  result.push(String.fromCharCode(i));
}
document.querySelector("a").href = "web-cat:begin" + result.join("") + "end";
</script>

Firefox

/web%2Bcat%3Abegin%2501%2502%2503%2504%2505%2506%2507%2508%250B%250C%250E%250F%2510%2511%2512%2513%2514%2515%2516%2517%2518%2519%251A%251B%251C%251D%251E%251F%20!%2522%23%24%25%26'()*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%2560abcdefghijklmnopqrstuvwxyz%7B%7C%7D~%257F%25C2%2580%25C2%2581end

Chrome

/web%2Bcat%3Abegin%2501%2502%2503%2504%2505%2506%2507%2508%250B%250C%250E%250F%2510%2511%2512%2513%2514%2515%2516%2517%2518%2519%251A%251B%251C%251D%251E%251F+!%22%23%24%25%26'()*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~%7F%25C2%2580%25C2%2581end

Analysis:

  • Firefox fails navigation if the custom URL it translates into a HTTPS URL includes %00, even though it would end up as %2500. Chrome doesn't. The standard does make note of being careful around U+0000 but I'm not sure how justified that is. If you want to navigate an endpoint to %00 you could do so directly. (This is basically Failure from resolving the url is ignored when using custom scheme and content handlers #481 and I think we should consider this a bug in Firefox and make the standard assert there cannot be a failure.)
  • Tab and newlines are stripped, as expected. (If you escaped them they would survive.)
  • Firefox uses %20 for space, Chrome +. I would consider this a bug in Chrome as this is the only bit where it seems to share logic with HTML forms.
  • Firefox ends up escaping U+0022 ("), U+0060 (`), and U+007F DELETE twice, despite only escaping U+007F DELETE when parsing custom URLs. I think Firefox is correct here for U+007F as that would have been escaped when parsing and serializing the custom URL. Somewhat surprised with Chrome's result for that one.
  • The encoded code points would be 0-20, 22-26, 2B, 2C, 2F, 3A-40, 5B-5E, 60, 7B-7D, 7F-Infinity (remember that not all of these are observable for the custom URL case as the input is a URL which is already parsed and serialized).
  • This would be the userinfo percent-encode set + 24-26, 2B, 2C. We could call it the "external string encode set" or some such.

I have no idea how to create automated tests for this. Thoughts? I suspect I'll just create manual tests for now, which might be tricky enough.

@domenic
Copy link
Member

domenic commented May 7, 2020

For automated tests, would it be possible to navigate to a protocol handler page that postMessages to the parent on success? (You'd you'd get a network error on failure, which would probably require waiting for a timeout...)

@annevk
Copy link
Member

annevk commented May 7, 2020

I'm not sure how you'd register the scheme, but that seems like a possible setup for the manual test.

@domenic
Copy link
Member

domenic commented May 7, 2020

Oh, right, this requires user confirmation. Probably that's a reasonable thing to ask for support from the WPT infra folks on. I bet most browsers already have a command line flag that allows such prompts to auto-confirm, for their own tests.

annevk added a commit that referenced this issue May 9, 2020
This makes things a bit more explicit and aligns what was "escaped version" with implementations.

Corresponding URL Standard PR: ...

Tests: ...

Closes #3377.
@annevk
Copy link
Member

annevk commented May 12, 2020

I had not yet realized that the above set I recommended is identical to encodeURIComponent(). That's a nice surprise! whatwg/url#513 defines it and #5524 updates HTML to make use of it.

annevk added a commit that referenced this issue May 12, 2020
This makes things a bit more explicit and roughly aligns what was "escaped version" with implementations.

Corresponding URL Standard PR: whatwg/url#513.

Tests: web-platform-tests/wpt#23504.

Closes #3377.
@annevk
Copy link
Member

annevk commented May 13, 2020

web-platform-tests/wpt#23504 now has detailed manual tests (unfortunately not all run in Chrome due to custom schemes not working with service workers, but enough do). The results are similar to what I described above, but I'll summarize for clarity.

The failures in Firefox are:

  • Does not handle %00 appearing in the URL with a custom URL well.
  • Percent-encodes U+0022 and U+0060 twice somehow, despite not percent-encoding them for non-special URLs (as is correct per the URL parser).

The failures in Chrome are:

  • Percent-encodes U+0020 as + rather than %20.
  • (Not a failure with this feature, but it prevents passing the text: a non-special URL does not get U+007F percent-encoded leading it to be only percent-encoded once (i.e., %7F instead of %257F).

Firefox would be happy to align with the proposed design. @bsittler @mgiuca Would Chrome?

@annevk
Copy link
Member

annevk commented May 14, 2020

@ericlaw1979 perhaps you have thoughts on #3377 (comment)?

@annevk
Copy link
Member

annevk commented May 19, 2020

@mfreed7 is this your responsibility at Google as well now? Could you weigh in on the above comment or find someone who can?

@mfreed7
Copy link
Contributor

mfreed7 commented May 19, 2020

@annevk this wouldn't be me, but perhaps @hayatoito could either comment or find the right person.

@foolip
Copy link
Member

foolip commented Jun 24, 2020

@mgiuca
Copy link
Contributor

mgiuca commented Jul 22, 2020

@annevk :

The failures in Chrome are:

  • Percent-encodes U+0020 as + rather than %20.

Changing this to encode U+0020 as %20 SGTM. I can't see how that would break sites, since they have to decode all the other %-escapes, I don't see why a site would special-case not decoding %20 properly.

  • (Not a failure with this feature, but it prevents passing the text: a non-special URL does not get U+007F percent-encoded leading it to be only percent-encoded once (i.e., %7F instead of %257F).

I don't understand this one. Why would you encode U+007F as %257F? (i.e., why not just encode it once like every other character)? Can you elaborate why you think that would be correct behaviour?

@annevk
Copy link
Member

annevk commented Jul 22, 2020

@mgiuca the URL parser would have already encoded U+007F as %7F. It's a bug in Chrome's URL parser and somewhat separate from this.

@fred-wang
Copy link
Contributor

Hi Anne, thanks for working on this! I'm not an expert in URL parsing, but after reading the proposed tests and spec changes, this looks good to me overall.

Regarding U+007F, you are essentially saying that https://url.spec.whatwg.org/#concept-basic-url-parser will percent-encode some characters in various supersets of https://url.spec.whatwg.org/#c0-control-percent-encode-set (C0 controls and all code points greater than U+007E), right?

(Incidentally, I noticed that there are two notes mentioning "greater than U+007F DELETE" or "higher than U+007F DELETE", maybe including U+007F was included later and not updated everywhere?)

Here is a simple test with registerProtocolHandler which behaves differently in Firefox and Chrome:

<script>
    let url = new URL(`web+foo:${String.fromCharCode(0x1F)}${String.fromCharCode(0x7F)}`);
    console.log(url.pathname)
</script>

Firefox's output "%1F%7F" sounds correct since IIRC, the parser will do:

  • scheme start state: append 'w' to buffer and move to scheme state
  • scheme state: append "eb+foo" to buffer and when arriving on the colon, set the scheme to "web+foo", reset buffer and move to "cannot-be-a-base-URL path state".
  • cannot-be-a-base-URL path state: U+1F and U+7F are not URL code points so that will cause a validation error (without termination) and then they belong to the C0 control percent-encode set so they will be percent-encoded and appended to path[0].

Regarding U+0020, I don't have any strong opinion on this but as I see this is indeed what is expected from your proposal to use this new "component percent-encode set" which includes the "query percent-encode set" and so U+0020, so that test looks correct.

Regarding service worker, Domenic mentioned elsewhere https://bugs.chromium.org/p/chromium/issues/detail?id=522370 ; I guess it's fine to keep them if it's just a bug in Chrome.

Regarding automating tests from my understanding of chrome's implementation I believe the UI permission request, handler preference registration and conversion of the URL with custom scheme are all done on the chrome side (not blink), so I think we would need a full chrome brower running (like what is done with the ./wpt run) rather just a blink content shell (like what is done with python third_party/blink/tools/run_web_tests.py). So even if we could use testdriver to emulate user clicks and introduce a new API to register a protocol or auto-accept permission requests, I don't think that would exercise the code in chrome/common/custom_handlers/protocol_handler.cc anyway ; and Anne's manual tests are fine for WPT coverage of the spec. In Chrome, we will likely need to do a C++ ui test like what I'm doing in https://chromium-review.googlesource.com/c/chromium/src/+/2287304/25/extensions/browser/api/protocol_handler_interactive_uitest.cc in order to obtain an automated test.

Finally, note that BroadcastChannel is not supported in WebKit (and probably not instrumental for your test) see https://bugs.webkit.org/show_bug.cgi?id=161472 ; just mentinoning it but this is probably not relevant given there is no plan to support registerProtocolHandler in WebKit.

@fred-wang
Copy link
Contributor

Here is a testcase to check percent escaping in https://url.spec.whatwg.org/#concept-basic-url-parser

<script>
    let c = String.fromCharCode(0x7F);
    console.log((new URL(`https://www.example.com/path{${String.fromCharCode(0x7F)}path.html?query'${c}=query#fragment<${c}fragment`)).href)
    console.log(new URL(`https://[${String.fromCharCode(0x7F)}@example.org`, "https://www.google.org").href);
    console.log((new URL(`web+ext:cannot-be-a-base-URL${String.fromCharCode(0x1F)}${String.fromCharCode(0x7F)}cannot-be-a-base-URL`)).href);
</script>

Chrome does not properly handle escaping of 0+7F in https://url.spec.whatwg.org/#cannot-be-a-base-url-path-state and https://url.spec.whatwg.org/#fragment-state ; I investigated a bit and I'm able to fix this by modifying these two lines:

https://source.chromium.org/chromium/chromium/src/+/master:url/url_canon_pathurl.cc;drc=19b1e5e4e1914b5b7464062ec300b817d2bac53d;l=35
https://source.chromium.org/chromium/chromium/src/+/master:url/url_canon_etc.cc;drc=01c25d47d2d22456368363e576083d766eedf8f6;l=280 (for some reason the last entry is missing)

@mgiuca
Copy link
Contributor

mgiuca commented Jul 29, 2020

@mgiuca the URL parser would have already encoded U+007F as %7F. It's a bug in Chrome's URL parser and somewhat separate from this.

Oh, I see, so we expect double-encoding of any reserved character, once as part of the basic URL parsing and then again to insert it into the parent URL.

If Chrome does that in general, but just doesn't do it for U+007F as an unrelated issue, then I'm fine with that (and we can look at that as just part of Chrome's general minor lack of adherence to the spec). I think Chrome is generally fine with this change.

@fred-wang
Copy link
Contributor

OK, so the U+007F chromium bug is known, it's actually https://bugs.chromium.org/p/chromium/issues/detail?id=809852 ; I uploaded a CL with WPT tests.

Regarding U+0020, the fix would be straightforward, just pass use_plus=false here: https://source.chromium.org/chromium/chromium/src/+/master:chrome/common/custom_handlers/protocol_handler.cc;drc=e8e7b4663189369a0ec9117c7fac954fdbe219d1;l=101

After these changes, I verified that Anne's manual tests without service worker pass in Chrome.

I don't know if I can speak for Chromium dev (I will have to send a formal intent-to for these two changes above) but at least I can say Igalia is interested in fixing these two bugs.

annevk added a commit that referenced this issue Aug 4, 2020
This makes things a bit more explicit and roughly aligns what was "escaped version" with implementations.

Corresponding URL Standard PR: whatwg/url#513.

Tests: web-platform-tests/wpt#23504.

Closes #3377.
blueboxd pushed a commit to blueboxd/chromium-legacy that referenced this issue Aug 19, 2020
Currently, URLs calculated by protocol handlers escape spaces as "+".
This CL changes this to use "%20" instead, aligning with Firefox and
the specification.

For details, read the following links:
https://chromestatus.com/feature/5678518908223488
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/osabCTBhDSs
https://bugs.chromium.org/p/chromium/issues/detail?id=795919
https://html.spec.whatwg.org/multipage/system-state.html#custom-handlers
whatwg/html#3377

Bug: 1110842
Change-Id: I31e2250c56d3c4654d5a8655ead953af33b5fb5c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2324126
Reviewed-by: Dominick Ng <dominickn@chromium.org>
Commit-Queue: Frédéric Wang <fwang@igalia.com>
Cr-Commit-Position: refs/heads/master@{#799541}
mfreed7 pushed a commit to mfreed7/html that referenced this issue Sep 11, 2020
This makes things a bit more explicit and roughly aligns what was "escaped version" with implementations.

Corresponding URL Standard PR: whatwg/url#513.

Tests: web-platform-tests/wpt#23504.

Closes whatwg#3377.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Implementations are not interoperable with each other topic: custom protocols
Development

Successfully merging a pull request may close this issue.

7 participants