Fix #301: detail how javascript: return values become response bodies #1107

domenic · 2016-04-22T21:44:03Z

This also does some minor cleanup to clarify that "run a classic script"
returns undefined when scripting is disabled.

This also does some minor cleanup to clarify that "run a classic script" returns undefined when scripting is disabled.

annevk · 2016-04-23T03:20:57Z

source

        data-x="concept-response-status">status</span> is <code data-x="">204</code>.</p>

        <p>Otherwise, the result of obtaining the resource for the URL is a <span
        data-x="concept-response">response</span> whose <span
        data-x="concept-response-header-list">header list</span> consists of <code
        data-x="">Content-Type</code>/<code>text/html</code> and whose <span
-        data-x="concept-response-body">body</span> is the return value converted to a string
-        value.</p>
+        data-x="concept-response-body">body</span> is <span>ToString</span>(<var>result</var>).</p>


It seems browsers are not really interoperable around this (doh). E.g., http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4099. Firefox seems to serialize the object somehow without invoking ToString(). Chrome refuses to navigate.

Edge also refuses to navigate. Amazing.

Note that the undefined case also refuses to navigate. (Is that also true for normal links that return 204s, or is there some special casing that I don't understand?)

It seems like throwing doesn't actually make a difference. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4100 also refuses to navigate in Chrome and Edge.

OK, so there was an issue with our examples: the JavaScript program

{toString: () => { return 5; }}

is actually a block with a label named toString, whose completion value is () => { return 5; }. So Firefox's answer is consistent with ToString().

If you try e.g. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4101 in Firefox, it gives 5, as expected from ToString() on the result. So that's good.

But if you try it in Chrome or Edge, they both refuse to render again, so that's not great.

Ah yes, 204 not navigating is normal I think. Not sure how accurately that is defined overall.

@DigiTec could you fill us in on how Edge converts javascript: URL completion values into strings to use as response bodies? So far Gecko seems straightforward but Blink and EdgeHTML are mysterious.

Also: only Firefox reports uncaught exceptions to the console (http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4104), and presumably to window.onerror.

Should we spec swallowing them to go with the majority? That feels strictly worse, which is sad.

What Gecko does in terms of conversion to bytes is that it examines the returned string to see whether all charCodeAt() values are 255 or less. If so, the string is treated as byte-inflated ISO-8859-1 data (and a response is synthesized which has "ISO-8859-1" as its encoding, with the byte-deflated bytes as data). This allows generation of non-text data, dating back to when we supported javascript: in <img>, say.

Otherwise the return value is treated as a sequence of UTF-16 code units encoding a Unicode string and converted to UTF-8 bytes (insert handwaving about what happens to unpaired surrogates here). The synthesized response has "UTF-8" as its encoding.

What Blink does is https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/core/v8/V8StringResource.cpp&sq=package:chromium&l=101&rcl=1461583037 which I haven't tried to wrap my head around.

I did try http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4105 in every browser and they all gave me back the string verbatim, so that didn't reveal much.

In Edge we specify two code pages for transformation. The first is the calculated code page which is always CP_UCS_2 which translates to Unicode, ISO 10646 according to comments. We then specify as the source code page CPSRC_NATIVEDATA which means native data, known to be CP_UCS_2 so don't allow any sort of fallbacks.

I also confirmed that our current behavior in Edge for the handling of the return value was put in place to match WebKit behavior and that legacy IE does have much more complicated behaviors still (handling of boolean return values, etc...)

@domenic The code you point to in V8StringResource seems to be about converting V8 strings to WebCore strings, not conversion to bytes.

I don't think Blink does conversion to bytes at all here. See the FIXME comment in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentWriter.cpp&l=75&ct=xref_jump_to_def&cl=GROK&gsn=appendReplacingData as called from https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentLoader.cpp&sq=package:chromium&l=684&rcl=1461583037 (DocumentLoader::replaceDocumentWhileExecutingJavaScriptURL).

In terms of test matrix, if we need to determine this in a black-box way, it seems to me that the following are somewhat useful cases to test:

Return string is all ASCII (charCodeAt() < 128 for all indices).

Return string has charCodeAt() < 256 for all indices, but does not fall into case 1.

Return string has does not have any charCodeAt values corresponding to UTF-16 surrogate code unit values, but does not fall into cases 1 or 2.

Return string has surrogate code units which are all paired properly.

Return string has unpaired surrogate code units.

Each of these should be tested in situations in which the source of the javascript: URL is either UTF-8 or ISO-8859-1/Windows-1252. That is, either an iframe in a document with that encoding with src pointing to a javascript: URL, or a link in a document with that encoding with href pointing to a javascript: URL. Probably test both scenarios.

The tests should look for the following things:

What is the document.body.textContent of the resulting document?

What is the document.charset of the resulting document?

(Seems easier to fix this all in one PR. No need for this to be a distinct commit either imo.)

domenic · 2016-04-25T17:49:58Z

I've uploaded new changes that bring this more in line with the browser majority and also adds a warning about the JS string -> byte conversion, saying we're still looking into that. I'd kind of like to merge that and work on figuring out the string -> byte business in a new PR.

bzbarsky · 2016-04-25T18:12:08Z

Just to check, that's basically just adding a Type() == string guard and not saying anything about how exceptions during evaluation are handled, right?

domenic · 2016-04-25T18:27:20Z

No, exceptions are handled by "If evaluation was unsuccessful, let result be undefined instead."

bzbarsky · 2016-04-25T20:08:00Z

No, exceptions are handled by "If evaluation was unsuccessful, let result be undefined instead."

Ah, in terms of the navigation behavior. OK, good, I think that matches all UAs. What about the onerror/reporting behavior?

In Gecko, we basically report the exception then pretend like execution succeeded and returned undefined.

bzbarsky · 2016-04-25T20:16:29Z

Oh, and not reporting the exception would be doable. Reporting it to the console but not firing onerror would be a lot more annoying.

domenic · 2016-04-25T22:13:11Z

I realized I was not testing the right thing previously with regard to error reporting. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4107 shows that Chrome, Edge, and Firefox all report the error. Which is good. That is also what the current patch set does, automatically; it calls "run a classic script" which, in the absence of flags, will report any thrown exceptions before returning undefined.

So I think the current patch matches everyone on error reporting behavior and everyone-but-Gecko on non-string behavior.

annevk · 2016-04-26T07:26:04Z

And the current patch doesn't address the string-or-byte question, though I guess if nobody but Gecko does that we might want to leave that out. It's also a lot less useful now you can no longer use javascript URLs everywhere (e.g., it does not work in <img>).

annevk · 2016-04-26T11:22:47Z

source

+        the bytes that comprise a <span data-x="concept-response-body">response body</span> is
+        not yet specified, pending further investigation into user agent behavior. See <a
+        href="https://github.com/whatwg/html/pull/1107/files/8784407eea43171053a5fe75890b779edaabac20#r60947977">discussion
+        in pull request #1107</a>.</p>


I was going to merge this but I think that if we want to investigate aligning with Gecko here (not sure why if Gecko is the only browser that does this) we should have a new issue for that and point there. Accountability is harder to track otherwise.

opened #1129. Will merge now.

smaug---- · 2016-04-27T00:32:05Z

Just curious, not objecting anything here, but do we know why non-Gecko engines don't do the obvious thing and convert the return value effectively to string? Like did anyone try to first file implementation bugs before this spec bug?
(This change will break my calculator, which is to write javascript: 1+1, after this change I'll need to write javascript: 1+1+"")

domenic · 2016-04-27T01:03:25Z

I kind of assumed such reasons were lost in the mists of time. It's not clear to me any particular behavior is more obvious than any other. In JS algorithms, ToString() logic is prevalent, but for transporting JS values into response bodies in the network stack, there's no real precedent.

For such an edge case, changing 1 browser seems like a better path to interop than changing 3.

tabatkins · 2016-04-27T05:47:38Z

(This change will break my calculator, which is to write javascript: 1+1, after this change I'll need to write javascript: 1+1+"")

Or just retrain your fingers to type Ctrl+Shift+J, which is less work than Ctrl+T + "javascript:". ^_^

bzbarsky · 2016-04-27T14:54:00Z

FWIW, this is why we shouldn't be using github for issue tracking: all the relevant discussion and data here was in the line comments on the PR, and github happily clobbered that once an updated version was pushed. :(

annevk · 2016-04-27T14:57:41Z

@bzbarsky you can still get to those comments quite easily by expanding the relevant thread. It's a rather great feature when the line comments are nitpicks. It's not super useful if substantive discussion happens there, but perhaps we should learn to not have substantive discussion in line comment threads.

smaug---- · 2016-04-27T15:47:07Z

Uh, I totally missed all the relevant discussion here. /me finds which github link to click to see the interesting comments about this issue...

bzbarsky · 2016-04-27T15:54:26Z

I think if you start at #1107 (comment) you should be able to read the relevant bits.

bzbarsky · 2016-04-27T19:44:14Z

I'll add a wpt for this in https://bugzilla.mozilla.org/show_bug.cgi?id=1268047

#1107 updated the algorithm to, per the browser majority, give 204s for non-strings instead of only for undefined. This updates the example following the normative algorithm to account for that change as well. Fixes #1415.

bzbarsky · 2016-10-13T16:04:46Z

@domenic I just did some testing, and at least Chrome and Safari do in fact convert all values to strings (but possibly to "" if the original value is not a string). This includes the undefined value. So this pull request specified something that matches at best only Edge and breaks sites, as we discovered when we implemented it in Gecko. See #1895

Fix #301: detail how javascript: return values become response bodies

8784407

This also does some minor cleanup to clarify that "run a classic script" returns undefined when scripting is disabled.

annevk reviewed Apr 23, 2016
View reviewed changes

also fix #945

2eb2810

(Seems easier to fix this all in one PR. No need for this to be a distinct commit either imo.)

annevk mentioned this pull request Apr 24, 2016

Define the HTTPS state of the response for navigating to javascript: URLs #945

Closed

Update to check Type and add warning about string -> bytes conversion

fb6f1ee

missing comma

57b59e5

annevk reviewed Apr 26, 2016
View reviewed changes

domenic mentioned this pull request Apr 26, 2016

javascript URL: define JS string-to-byte conversion better #1129

Closed

Update issue reference

6d1be1f

domenic merged commit 9997cd9 into master Apr 26, 2016

domenic deleted the javascript-convert-to-string branch April 26, 2016 13:32

domenic mentioned this pull request Jun 14, 2016

Fix javascript: URL example explanation #1421

Merged

domenic mentioned this pull request Oct 13, 2016

javascript URLs that do not return strings is apparently wrong #1896

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #301: detail how javascript: return values become response bodies #1107

Fix #301: detail how javascript: return values become response bodies #1107

domenic commented Apr 22, 2016

annevk Apr 23, 2016

domenic Apr 23, 2016 •

edited

Loading

domenic Apr 23, 2016

annevk Apr 23, 2016

domenic Apr 23, 2016

domenic Apr 25, 2016

bzbarsky Apr 25, 2016

domenic Apr 25, 2016

DigiTec Apr 25, 2016 •

edited

Loading

bzbarsky Apr 25, 2016

domenic commented Apr 25, 2016

bzbarsky commented Apr 25, 2016

domenic commented Apr 25, 2016

bzbarsky commented Apr 25, 2016

bzbarsky commented Apr 25, 2016

domenic commented Apr 25, 2016

annevk commented Apr 26, 2016

annevk Apr 26, 2016

domenic Apr 26, 2016

annevk Apr 26, 2016

smaug---- commented Apr 27, 2016

domenic commented Apr 27, 2016

tabatkins commented Apr 27, 2016

bzbarsky commented Apr 27, 2016

annevk commented Apr 27, 2016

smaug---- commented Apr 27, 2016

bzbarsky commented Apr 27, 2016

bzbarsky commented Apr 27, 2016

bzbarsky commented Oct 13, 2016

Fix #301: detail how javascript: return values become response bodies #1107

Fix #301: detail how javascript: return values become response bodies #1107

Conversation

domenic commented Apr 22, 2016

Choose a reason for hiding this comment

domenic Apr 23, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DigiTec Apr 25, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic commented Apr 25, 2016

bzbarsky commented Apr 25, 2016

domenic commented Apr 25, 2016

bzbarsky commented Apr 25, 2016

bzbarsky commented Apr 25, 2016

domenic commented Apr 25, 2016

annevk commented Apr 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smaug---- commented Apr 27, 2016

domenic commented Apr 27, 2016

tabatkins commented Apr 27, 2016

bzbarsky commented Apr 27, 2016

annevk commented Apr 27, 2016

smaug---- commented Apr 27, 2016

bzbarsky commented Apr 27, 2016

bzbarsky commented Apr 27, 2016

bzbarsky commented Oct 13, 2016

domenic Apr 23, 2016 •

edited

Loading

DigiTec Apr 25, 2016 •

edited

Loading