Define speculative HTML parsing #5959

zcorpan · 2020-09-29T14:37:19Z

At least two implementers are interested (and none opposed):
- Chromium (@chrishtr, @mfreed7)
- Gecko (@hsivonen has fixed a bug already)
- WebKit (@othermaciej's comments: Specify speculative HTML parsing (preload scanner) #5624 (comment) & Specify speculative HTML parsing (preload scanner) #5624 (comment))
Tests are written and can be reviewed and commented upon at:
- HTML: Add tentative tests for speculative HTML parsing web-platform-tests/wpt#24521
Implementation bugs are filed:
- Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=1144176
- Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1674583
- Safari: https://bugs.webkit.org/show_bug.cgi?id=229709

(See WHATWG Working Mode: Changes for more details.)

/index.html ( diff )
/parsing.html ( diff )
/references.html ( diff )

chrishtr · 2020-10-29T19:51:58Z

@domenic could you review, and @mfreed7 also?

domenic

Overall quite solid, and I'd be pretty happy with merging it as-is. I tried to find ways to polish it and commented on those.

More implementer weigh-in would be ideal (specifically @hsivonen would be great), since I don't feel confident about that aspect of the review.

source

domenic · 2020-10-30T17:20:28Z

source

+
+    <p class="note">It is possible that the same markup is seen multiple times from the
+    <span>speculative HTML parser</span> and then the normal HTML parser. It is expected that
+    duplicated fetches will be prevented by normal caching rules.</p>


"normal caching rules" sounds like it's referring to well-specified caching rules. But I think in reality they're prevented by unspecified memory caches.

For better or worse, Gecko at present tries a speculative fetch at most once for each unique URL seen during a page load regardless of "normal caching rules".

Belated +1 to what @domenic said - at the very least, we need to acknowledge that the rules are currently unspecified.

@hsivonen do you think the spec should suggest that strategy for speculative fetches?

source

domenic · 2020-10-30T17:23:46Z

source

+   <var>speculativeParser</var>.</p></li>
+
+   <li><p><span>In parallel</span>, run <var>speculativeParser</var> until it is stopped or until it
+   reaches the end of its <span>input stream</span>.</p></li>


Do speculative parsers stop? I thought they'd just kind of try to keep going, ignoring scripts or similar...

Or, can speculative parsers have their own speculative parsers? That might be worth calling out explicitly.

I think when the normal parser starts parsing again, the speculative parser is stopped. At least I think this is true for Gecko, less sure about Chromium and WebKit. cc @hsivonen @mfreed7

In Gecko, I think a speculative parser can have a speculative parser, but I didn't do that in the spec to make the model a bit simpler. If a parser-blocking script document.writes another parser-blocking script, the spec starts over speculative parsing altogether. The main "win" for the spec is that it doesn't need to check whether the document.write did something that invalidates existing speculations. This can still be added in in the future if implementers would like to have that specified.

source

yoavweiss · 2020-11-28T18:04:00Z

source

+
+    <p class="note">It is possible that the same markup is seen multiple times from the
+    <span>speculative HTML parser</span> and then the normal HTML parser. It is expected that
+    duplicated fetches will be prevented by normal caching rules.</p>


Belated +1 to what @domenic said - at the very least, we need to acknowledge that the rules are currently unspecified.

source

yoavweiss · 2020-11-28T19:59:31Z

source

+
+   <li>
+    <p>If the <span>speculative HTML parser</span> encounters one of the following elements, then
+    act as if that element is processed for the purpose of its effect of speculative fetches for


Nit: Chromium and WebKit's implementations deal with tags (and their tokens), not elements. Dunno if it matters.

Tokens or tags seems slightly more correct from the spec perspective too; WDYT @zcorpan?

zcorpan · 2021-01-21T15:27:32Z

@domenic @yoavweiss I've tried to address the feedback by being a bit more specific and inventing a concept of "speculative mock elements", which can't cause things to happen.

domenic

Wow, this is pretty inventive. After staring at it for a bit, I think it works, and I like it!!

Still some other minor code review comments to resolve, but I think this is a very clean way of semi-rigorously building up a tree structure, while making it clear none of the normal mechanisms happen.

source

zcorpan · 2021-01-22T14:00:10Z

@domenic thanks!

Another reason for this approach is that e.g. the Adoption Agency Algorithm can mutate parts of the DOM tree before the parser-blocking script, which would be bad to do speculatively on the real DOM tree.

domenic

Only a couple of minor things!

source

zcorpan · 2021-01-25T19:49:13Z

@domenic thanks, fixed!

domenic

LGTM! It seems you've got a couple open threads left with @mfreed7 and @hsivonen; I'll let you decide whether you want to wait on resolving those or merge and potentially revisit later if they come back with feedback.

zcorpan · 2021-01-25T20:42:12Z

I'd ideally like a LGTM from at least two implementers here.

andreubotella · 2021-02-03T12:34:51Z

source

+
+  <ul>
+   <li>
+    <p>The state of the normal HTML parser and the document itself must not be affected.</p>


The behavior of the input stream and the input byte stream probably needs to be specified with more detail – since tokens pushed into the input (byte) stream must also be pushed into the speculative parser's stream, but tokens read from the streams should be independent.

Thanks! 6a20e19

This list is hand-wavy, certainly, but the feedback from implementers is to not specify the speculative parser in precise detail as the approaches used in different engines are different and currently we're mostly concerned about observable differences and that this optimization is specified at all.

mfreed7 · 2021-02-13T02:43:10Z

So overall I like this spec change. In essence, the preload scanner shouldn't load something that the "real" parser wouldn't eventually load also, assuming no document.write()'s change things in the meantime. As a spec, I think that makes sense, and due to Gecko's use of the actual parser / tree builder, most of the tests pass there. On the Chromium side, the implementation is quite separate, with the preload scanner being a very thin approximation of the real tree builder. As such, more of the tests fail, and some might be tough to fix. One alternative is to re-architect the preload scanner to use the real tree builder, but I don't think that's worth the effort. Which means the "fix" for Chromium will amount to trying to patch the holes in the approximations being made in the existing preload scanner. I'm ok with that. There's already a nice bug summarizing those errors for us.

TL;DR, I'm supportive of landing this PR along with the test PR.

source

hsivonen · 2021-02-15T12:25:47Z

Thanks for working on this.

The spec seems to allow speculative fetches only when a script is blocking the parser. In Firefox, speculative fetches can happen even when the parser is not blocked by script but DOM building actions are being accumulated into a larger batch of work and speculative fetches are started during that accumulation.

I think this is relevant to whether Firefox complies to the spec text in the as-if sense in case DOM changes come from other sources than parser-blocking scripts (e.g. async scripts or timeouts set by a same-origin parent). Specifically, if something else disconnects a node such that the parser keeps inserting nodes that are not in the document and this causes some non-speculative fetches not to occur, Firefox still performs speculative fetches. (Images are fetched even if disconnected, but I think there are other fetch types where the non-speculative case requires the node to be in the document.)

I think it should be conforming to perform speculative fetches at any time on the assumption that no scripted action, regardless whether from a parser-blocking script or not, does anything.

In Firefox, duplicate request avoidance doesn't go all the way to the cache but the speculative load machinery refuses to speculatively fetch the same URL twice.

I don't see spec text connecting the creation of speculative mock elements to speculative fetches.

zcorpan · 2021-04-20T14:05:01Z

@hsivonen

In Firefox, duplicate request avoidance doesn't go all the way to the cache but the speculative load machinery refuses to speculatively fetch the same URL twice.

Is it keyed off of only the URL, or a tuple of URL, crossorigin attribute's state, referrerpolicy attribute's state, or some such?

zcorpan · 2021-04-22T13:44:06Z

The spec seems to allow speculative fetches only when a script is blocking the parser. In Firefox, speculative fetches can happen even when the parser is not blocked by script but DOM building actions are being accumulated into a larger batch of work and speculative fetches are started during that accumulation.

I think this is relevant to whether Firefox complies to the spec text in the as-if sense in case DOM changes come from other sources than parser-blocking scripts (e.g. async scripts or timeouts set by a same-origin parent). Specifically, if something else disconnects a node such that the parser keeps inserting nodes that are not in the document and this causes some non-speculative fetches not to occur, Firefox still performs speculative fetches. [...]

I think it should be conforming to perform speculative fetches at any time on the assumption that no scripted action, regardless whether from a parser-blocking script or not, does anything.

I'm happy to allow it. I think we need to separate speculative parsing and speculative fetching a bit more in the spec, so normal parsing can also cause speculative fetches...

(Images are fetched even if disconnected, but I think there are other fetch types where the non-speculative case requires the node to be in the document.)

Yes, for example <link rel=stylesheet>, <link rel=modulepreload>.

<video poster> seems to not require the element to be browsing-context connected per spec: https://html.spec.whatwg.org/multipage/media.html#poster-frame

The spec allows fetching <script src> when the src attribute is set, before the element is connected:

"For performance reasons, user agents may start fetching the classic script or module graph (as defined above) as soon as the src attribute is set, instead, in the hope that the element will be inserted into the document (and that the crossorigin attribute won't change value in the meantime)."
https://html.spec.whatwg.org/multipage/scripting.html#prepare-a-script

I can't tell for <link rel=preload>, SVG <image>.

I don't see spec text connecting the creation of speculative mock elements to speculative fetches.

Yeah, this is maybe too handwavily implied currently. I'll try to address it along with allowing regular-parsing speculative fetches.

zcorpan · 2021-05-20T13:03:39Z

I've rebased on current main to resolve merge conflicts.

Fixes #5624.

hsivonen · 2021-08-18T14:44:02Z

In Firefox, duplicate request avoidance doesn't go all the way to the cache but the speculative load machinery refuses to speculatively fetch the same URL twice.

Is it keyed off of only the URL, or a tuple of URL, crossorigin attribute's state, referrerpolicy attribute's state, or some such?

URL only with the twist that if there is a media query that doesn't apply, the URL is ignored instead of listed as already loaded.

This might not be a good idea. This code predates the introduction of the crossorigin attribute and the referrerpolicy attribute.

zcorpan · 2021-08-18T14:53:47Z

@hsivonen I've addressed your comments, except I haven't specified the media query twist. I specified the duplicate fetch prevention to be URL only for now.

zcorpan · 2021-08-31T09:24:22Z

Arguably, "Let url be the URL that element would fetch if it was processed normally" covers the media query aspect (as well as type attribute, etc), though it's not very explicit.

hsivonen

Thanks. I think the spec should allow speculation while looking for meta charset and the spec should allow fetches that the normal parser will see in the future to start speculatively. This could be explained by starting a parallel speculative parser at the start of the document with the HTTP-layer encoding, if there is one, or, otherwise, the inherited encoding, if there is one, or, otherwise, UTF-8.

source

…owed

mfreed7

This looks pretty good to me. I like the "speculative mock element" concept - makes the spec pretty straightforward here. I added a few small comments, but overall LGTM.

source

zcorpan mentioned this pull request Sep 29, 2020

Specify speculative HTML parsing (preload scanner) #5624

Closed

zcorpan marked this pull request as ready for review October 29, 2020 16:27

zcorpan changed the title ~~Define speculative HTML parsing (WIP)~~ Define speculative HTML parsing Oct 29, 2020

zcorpan force-pushed the bocoup/speculative-parsing branch from 7a81521 to 122cff2 Compare October 29, 2020 18:40

domenic reviewed Oct 30, 2020

View reviewed changes

domenic mentioned this pull request Oct 30, 2020

Speculative parser and dynamic import maps WICG/import-maps#234

Closed

yoavweiss reviewed Nov 28, 2020

View reviewed changes

domenic mentioned this pull request Jan 8, 2021

Discuss interaction with speculative parsing/fetching WICG/import-maps#241

Merged

Base automatically changed from master to main January 15, 2021 07:58

domenic reviewed Jan 22, 2021

View reviewed changes

source Outdated Show resolved Hide resolved

domenic reviewed Jan 25, 2021

View reviewed changes

source Outdated Show resolved Hide resolved

source Outdated Show resolved Hide resolved

source Outdated Show resolved Hide resolved

zcorpan force-pushed the bocoup/speculative-parsing branch from 707607d to 565b295 Compare January 25, 2021 19:48

domenic approved these changes Jan 25, 2021

View reviewed changes

andreubotella reviewed Feb 3, 2021

View reviewed changes

past mentioned this pull request Feb 10, 2021

Upcoming HTML standard issue triage meeting on 3/4/2021 #6371

Closed

annevk reviewed Feb 15, 2021

View reviewed changes

source Outdated Show resolved Hide resolved

annevk added the topic: parser label Apr 28, 2021

zcorpan force-pushed the bocoup/speculative-parsing branch from 6a20e19 to d65ab07 Compare May 20, 2021 13:01

domenic force-pushed the main branch from 0947339 to 1b5099f Compare July 27, 2021 17:07

Define speculative HTML parsing

0dce7bb

Fixes #5624.

zcorpan force-pushed the bocoup/speculative-parsing branch from d65ab07 to 0dce7bb Compare August 11, 2021 14:28

zcorpan added 2 commits August 18, 2021 16:19

Avoid speculatively fetching the same URL multiple times

1fcf85f

Connect creation of speculative mock element to speculative fetch

023e502

Allow speculative fetches for normal parsing

775d2bc

hsivonen suggested changes Aug 31, 2021

View reviewed changes

source Show resolved Hide resolved

source Show resolved Hide resolved

Fix statement of fact: speculative fetches from normal parsing is all…

1be217d

…owed

hsivonen approved these changes Aug 31, 2021

View reviewed changes

This was referenced Aug 31, 2021

HTML: Add tentative tests for speculative HTML parsing web-platform-tests/wpt#24521

Merged

Allow speculative fetches during meta charset prescan #7001

Open

mfreed7 reviewed Sep 11, 2021

View reviewed changes

source Show resolved Hide resolved

source Show resolved Hide resolved

source Show resolved Hide resolved

source Show resolved Hide resolved

Address mfreed7's comments

3535735

mfreed7 mentioned this pull request Sep 13, 2021

Clarify, normatively, that speculative fetches should never be duplicated #7065

Open

mfreed7 approved these changes Sep 13, 2021

View reviewed changes

zcorpan mentioned this pull request Sep 14, 2021

Allow speculative fetches in Declarative Shadow DOM template elements #7069

Closed

zcorpan merged commit 92a152c into main Sep 14, 2021

zcorpan deleted the bocoup/speculative-parsing branch September 14, 2021 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define speculative HTML parsing #5959

Define speculative HTML parsing #5959

zcorpan commented Sep 29, 2020 •

edited by pr-preview bot

chrishtr commented Oct 29, 2020

domenic left a comment

domenic Oct 30, 2020

hsivonen Oct 31, 2020

yoavweiss Nov 28, 2020

zcorpan Jan 22, 2021

domenic Oct 30, 2020

zcorpan Jan 22, 2021

yoavweiss Nov 28, 2020

yoavweiss Nov 28, 2020

domenic Jan 25, 2021

zcorpan commented Jan 21, 2021

domenic left a comment •

edited

zcorpan commented Jan 22, 2021

domenic left a comment

zcorpan commented Jan 25, 2021

domenic left a comment

zcorpan commented Jan 25, 2021

andreubotella Feb 3, 2021 •

edited

zcorpan Apr 20, 2021

mfreed7 commented Feb 13, 2021

hsivonen commented Feb 15, 2021

zcorpan commented Apr 20, 2021

zcorpan commented Apr 22, 2021

zcorpan commented May 20, 2021

hsivonen commented Aug 18, 2021

zcorpan commented Aug 18, 2021

zcorpan commented Aug 31, 2021

hsivonen left a comment

mfreed7 left a comment

Define speculative HTML parsing #5959

Define speculative HTML parsing #5959

Conversation

zcorpan commented Sep 29, 2020 • edited by pr-preview bot

chrishtr commented Oct 29, 2020

domenic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zcorpan commented Jan 21, 2021

domenic left a comment • edited

Choose a reason for hiding this comment

zcorpan commented Jan 22, 2021

domenic left a comment

Choose a reason for hiding this comment

zcorpan commented Jan 25, 2021

domenic left a comment

Choose a reason for hiding this comment

zcorpan commented Jan 25, 2021

andreubotella Feb 3, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfreed7 commented Feb 13, 2021

hsivonen commented Feb 15, 2021

zcorpan commented Apr 20, 2021

zcorpan commented Apr 22, 2021

zcorpan commented May 20, 2021

hsivonen commented Aug 18, 2021

zcorpan commented Aug 18, 2021

zcorpan commented Aug 31, 2021

hsivonen left a comment

Choose a reason for hiding this comment

mfreed7 left a comment

Choose a reason for hiding this comment

zcorpan commented Sep 29, 2020 •

edited by pr-preview bot

domenic left a comment •

edited

andreubotella Feb 3, 2021 •

edited