Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #301: detail how javascript: return values become response bodies #1107

Merged
merged 5 commits into from
Apr 26, 2016

Conversation

domenic
Copy link
Member

@domenic domenic commented Apr 22, 2016

This also does some minor cleanup to clarify that "run a classic script"
returns undefined when scripting is disabled.

This also does some minor cleanup to clarify that "run a classic script"
returns undefined when scripting is disabled.
data-x="concept-response-status">status</span> is <code data-x="">204</code>.</p>

<p>Otherwise, the result of obtaining the resource for the URL is a <span
data-x="concept-response">response</span> whose <span
data-x="concept-response-header-list">header list</span> consists of <code
data-x="">Content-Type</code>/<code>text/html</code> and whose <span
data-x="concept-response-body">body</span> is the return value converted to a string
value.</p>
data-x="concept-response-body">body</span> is <span>ToString</span>(<var>result</var>).</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems browsers are not really interoperable around this (doh). E.g., http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4099. Firefox seems to serialize the object somehow without invoking ToString(). Chrome refuses to navigate.

Copy link
Member Author

@domenic domenic Apr 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge also refuses to navigate. Amazing.

Note that the undefined case also refuses to navigate. (Is that also true for normal links that return 204s, or is there some special casing that I don't understand?)

It seems like throwing doesn't actually make a difference. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4100 also refuses to navigate in Chrome and Edge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so there was an issue with our examples: the JavaScript program

{toString: () => { return 5; }}

is actually a block with a label named toString, whose completion value is () => { return 5; }. So Firefox's answer is consistent with ToString().

If you try e.g. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4101 in Firefox, it gives 5, as expected from ToString() on the result. So that's good.

But if you try it in Chrome or Edge, they both refuse to render again, so that's not great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, 204 not navigating is normal I think. Not sure how accurately that is defined overall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DigiTec could you fill us in on how Edge converts javascript: URL completion values into strings to use as response bodies? So far Gecko seems straightforward but Blink and EdgeHTML are mysterious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: only Firefox reports uncaught exceptions to the console (http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4104), and presumably to window.onerror.

Should we spec swallowing them to go with the majority? That feels strictly worse, which is sad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Gecko does in terms of conversion to bytes is that it examines the returned string to see whether all charCodeAt() values are 255 or less. If so, the string is treated as byte-inflated ISO-8859-1 data (and a response is synthesized which has "ISO-8859-1" as its encoding, with the byte-deflated bytes as data). This allows generation of non-text data, dating back to when we supported javascript: in <img>, say.

Otherwise the return value is treated as a sequence of UTF-16 code units encoding a Unicode string and converted to UTF-8 bytes (insert handwaving about what happens to unpaired surrogates here). The synthesized response has "UTF-8" as its encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Blink does is https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/core/v8/V8StringResource.cpp&sq=package:chromium&l=101&rcl=1461583037 which I haven't tried to wrap my head around.

I did try http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4105 in every browser and they all gave me back the string verbatim, so that didn't reveal much.

Copy link

@DigiTec DigiTec Apr 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Edge we specify two code pages for transformation. The first is the calculated code page which is always CP_UCS_2 which translates to Unicode, ISO 10646 according to comments. We then specify as the source code page CPSRC_NATIVEDATA which means native data, known to be CP_UCS_2 so don't allow any sort of fallbacks.

I also confirmed that our current behavior in Edge for the handling of the return value was put in place to match WebKit behavior and that legacy IE does have much more complicated behaviors still (handling of boolean return values, etc...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domenic The code you point to in V8StringResource seems to be about converting V8 strings to WebCore strings, not conversion to bytes.

I don't think Blink does conversion to bytes at all here. See the FIXME comment in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentWriter.cpp&l=75&ct=xref_jump_to_def&cl=GROK&gsn=appendReplacingData as called from https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentLoader.cpp&sq=package:chromium&l=684&rcl=1461583037 (DocumentLoader::replaceDocumentWhileExecutingJavaScriptURL).

In terms of test matrix, if we need to determine this in a black-box way, it seems to me that the following are somewhat useful cases to test:

  1. Return string is all ASCII (charCodeAt() < 128 for all indices).
  2. Return string has charCodeAt() < 256 for all indices, but does not fall into case 1.
  3. Return string has does not have any charCodeAt values corresponding to UTF-16 surrogate code unit values, but does not fall into cases 1 or 2.
  4. Return string has surrogate code units which are all paired properly.
  5. Return string has unpaired surrogate code units.

Each of these should be tested in situations in which the source of the javascript: URL is either UTF-8 or ISO-8859-1/Windows-1252. That is, either an iframe in a document with that encoding with src pointing to a javascript: URL, or a link in a document with that encoding with href pointing to a javascript: URL. Probably test both scenarios.

The tests should look for the following things:

  1. What is the document.body.textContent of the resulting document?
  2. What is the document.charset of the resulting document?

(Seems easier to fix this all in one PR. No need for this to be a
distinct commit either imo.)
@domenic
Copy link
Member Author

domenic commented Apr 25, 2016

I've uploaded new changes that bring this more in line with the browser majority and also adds a warning about the JS string -> byte conversion, saying we're still looking into that. I'd kind of like to merge that and work on figuring out the string -> byte business in a new PR.

@bzbarsky
Copy link
Contributor

Just to check, that's basically just adding a Type() == string guard and not saying anything about how exceptions during evaluation are handled, right?

@domenic
Copy link
Member Author

domenic commented Apr 25, 2016

No, exceptions are handled by "If evaluation was unsuccessful, let result be undefined instead."

@bzbarsky
Copy link
Contributor

No, exceptions are handled by "If evaluation was unsuccessful, let result be undefined instead."

Ah, in terms of the navigation behavior. OK, good, I think that matches all UAs. What about the onerror/reporting behavior?

In Gecko, we basically report the exception then pretend like execution succeeded and returned undefined.

@bzbarsky
Copy link
Contributor

Oh, and not reporting the exception would be doable. Reporting it to the console but not firing onerror would be a lot more annoying.

@domenic
Copy link
Member Author

domenic commented Apr 25, 2016

I realized I was not testing the right thing previously with regard to error reporting. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4107 shows that Chrome, Edge, and Firefox all report the error. Which is good. That is also what the current patch set does, automatically; it calls "run a classic script" which, in the absence of flags, will report any thrown exceptions before returning undefined.

So I think the current patch matches everyone on error reporting behavior and everyone-but-Gecko on non-string behavior.

@annevk
Copy link
Member

annevk commented Apr 26, 2016

And the current patch doesn't address the string-or-byte question, though I guess if nobody but Gecko does that we might want to leave that out. It's also a lot less useful now you can no longer use javascript URLs everywhere (e.g., it does not work in <img>).

the bytes that comprise a <span data-x="concept-response-body">response body</span> is
not yet specified, pending further investigation into user agent behavior. See <a
href="https://github.com/whatwg/html/pull/1107/files/8784407eea43171053a5fe75890b779edaabac20#r60947977">discussion
in pull request #1107</a>.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to merge this but I think that if we want to investigate aligning with Gecko here (not sure why if Gecko is the only browser that does this) we should have a new issue for that and point there. Accountability is harder to track otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #1129. Will merge now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@domenic domenic merged commit 9997cd9 into master Apr 26, 2016
@domenic domenic deleted the javascript-convert-to-string branch April 26, 2016 13:32
@smaug----
Copy link
Collaborator

Just curious, not objecting anything here, but do we know why non-Gecko engines don't do the obvious thing and convert the return value effectively to string? Like did anyone try to first file implementation bugs before this spec bug?
(This change will break my calculator, which is to write javascript: 1+1, after this change I'll need to write javascript: 1+1+"")

@domenic
Copy link
Member Author

domenic commented Apr 27, 2016

I kind of assumed such reasons were lost in the mists of time. It's not clear to me any particular behavior is more obvious than any other. In JS algorithms, ToString() logic is prevalent, but for transporting JS values into response bodies in the network stack, there's no real precedent.

For such an edge case, changing 1 browser seems like a better path to interop than changing 3.

@tabatkins
Copy link
Collaborator

(This change will break my calculator, which is to write javascript: 1+1, after this change I'll need to write javascript: 1+1+"")

Or just retrain your fingers to type Ctrl+Shift+J, which is less work than Ctrl+T + "javascript:". ^_^

@bzbarsky
Copy link
Contributor

FWIW, this is why we shouldn't be using github for issue tracking: all the relevant discussion and data here was in the line comments on the PR, and github happily clobbered that once an updated version was pushed. :(

@annevk
Copy link
Member

annevk commented Apr 27, 2016

@bzbarsky you can still get to those comments quite easily by expanding the relevant thread. It's a rather great feature when the line comments are nitpicks. It's not super useful if substantive discussion happens there, but perhaps we should learn to not have substantive discussion in line comment threads.

@smaug----
Copy link
Collaborator

Uh, I totally missed all the relevant discussion here. /me finds which github link to click to see the interesting comments about this issue...

@bzbarsky
Copy link
Contributor

I think if you start at #1107 (comment) you should be able to read the relevant bits.

@bzbarsky
Copy link
Contributor

I'll add a wpt for this in https://bugzilla.mozilla.org/show_bug.cgi?id=1268047

domenic added a commit that referenced this pull request Jun 14, 2016
#1107 updated the algorithm to, per
the browser majority, give 204s for non-strings instead of only for
undefined. This updates the example following the normative algorithm
to account for that change as well.

Fixes #1415.
annevk pushed a commit that referenced this pull request Jun 16, 2016
#1107 updated the algorithm to, per
the browser majority, give 204s for non-strings instead of only for
undefined. This updates the example following the normative algorithm
to account for that change as well.

Fixes #1415.
@bzbarsky
Copy link
Contributor

@domenic I just did some testing, and at least Chrome and Safari do in fact convert all values to strings (but possibly to "" if the original value is not a string). This includes the undefined value. So this pull request specified something that matches at best only Edge and breaks sites, as we discovered when we implemented it in Gecko. See #1895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants