Don't cache HTTP errors in the module map #10327

lucacasonato · 2024-05-07T11:58:07Z

Multiple parallel imports will still only fetch once, and if the fetch
fails, will both error. A subsequent import (dynamic import, a
dynamically inserted via <script src="..." type="module">, or a static
import from either of these) to the same specifier will result in a
re-fetch if the previous fetch failed.

Fixes #6768

At least two implementers are interested (and none opposed):
- Safari
- …
Tests are written and can be reviewed and commented upon at:
- …
Implementation bugs are filed:
- Chromium: …
- Gecko: …
- WebKit: …
- Deno (only for timers, structured clone, base64 utils, channel messaging, module resolution, web workers, and web storage): …
- Node.js (only for timers, structured clone, base64 utils, channel messaging, and module resolution): …
MDN issue is filed: …
The top of this comment includes a clear commit message to use.

(See WHATWG Working Mode: Changes for more details.)

/acknowledgements.html ( diff )
/webappapis.html ( diff )

annevk

Looks good to me modulo nits.

cc @whatwg/modules

source

Multiple parallel imports will still only fetch once, and if the fetch fails, will both error. A subsequent import (dynamic import, a dynamically inserted via `<script src="..." type="module">`, or a static import from either of these) to the same specifier will result in a re-fetch if the previous fetch failed.

nicolo-ribaudo · 2024-05-07T13:30:01Z

source

+     <li><p>If <var>moduleMap</var>[(<var>url</var>, <var>moduleType</var>)] <span
+     data-x="map exists">exists</span>, run <var>onComplete</var> given
+     <var>moduleMap</var>[(<var>url</var>, <var>moduleType</var>)], and return.</p></li>


We need to handle the case in which moduleMap[(url, moduleType)] exists and is "fetching", since we should not pass "fetching" to onComplete.

Test case:

await Promise.all([ /* 1 */ import("./network-error.js").catch(() => /* 3 */ import("./network-error.js").catch(() => {}) ), /* 2 */ import("./network-error.js").catch(() => {}) ]);

This is what I think is happening (but please somebody with better understanding of event loops confirm):

/* 1 */ sets moduleMap["./network-error.js"] to "fetching", and performs the fetch. When that fetch is done, a task T1 is scheduled to the networking task source.

/* 2 */ sees that moduleMap["./network-error.js"] is "fetching", and waits.

At some point the event loop runs T1, which will run the processResponseConsumeBody steps of fetch a single module script:

It changes moduleMap[( url , moduleType )] (it removes it), so a task T2 is scheduled to continue running /* 2 */.

It runs onComplete with null, which will call FinishLoadingImportedModule with an error completion in the last step of HostLoadImportedModule.

This will then reject the dynamic import promise in ContinueDynamicImport, thus queueing a microtask (in HostEnqueuePromiseJob) to run /* 1 */'s .catch() callback

Once this is done, we drain the microtask queue and thus execute /* 3 */:

moduleMap doen't have a "./network-error.js" entry anymore, so it sets moduleMap["./network-error.js"] to "fetching", and performs the fetch.

At this point we are done running T1, and we can start running T2. moduleMap["./network-error.js"] exists (and has value "fetching"), so we run onComplete with "fetching" which is wrong.

Once this is fixed the code above should probably be used for a WPT test.

nicolo-ribaudo · 2024-05-07T13:58:32Z

The current safari behavior is that in this case:

import("./network-error.js");
import("./network-error.js");

after the first import fails, it performs a second fetch for the second one (rather than deduplicating them and immediately erroring for both.)

And in this case:

import("./network-error.js");
import("./network-error.js");
import("./network-error.js");

it performs three fetches serially.

lucacasonato · 2024-05-07T15:12:38Z

I have addressed @nicolo-ribaudo's review comment by removing the race condition between notification and read of the value in the module map. PTAL again @annevk.

Note that the behaviour here is not identical to Safari in the case where you a parallel dynamic import for the same specifier. In this PR, both dynamic imports will always resolve / reject with the same value. For dynamic imports started at the same time, you can't resolve one, but reject another.

nicolo-ribaudo

LGTM, thanks!

nicolo-ribaudo · 2024-05-07T15:22:44Z

One change introduced by this PR is that now this code:

import("x").then(() => {
  console.log(1);
  Promise.resolve().then(() => console.log(2));
});
import("x").then(() => {
  console.log(3);
  Promise.resolve().then(() => console.log(4));
});

is guaranteed to log (on success) 1, 3, 2, 4.

Previously, the possible behaviors where 1, 2, 3, 4 or 3, 4, 1, 2 (which one exactly was implementation-defined I think).

I think this change is ok/good, but if we want to revert it it can be done by scheduling tasks to call the various callbacks rather than calling them directly in the loop.

domenic · 2024-05-08T02:51:13Z

I'm working on getting a Chromium position on this. I am hopeful it will be positive.

However, we'll likely need help with someone updating web platform tests, and adding new ones, including adding ones for the new scenario @nicolo-ribaudo points out, and for the differences between this and current Safari behavior.

domenic

From an editor perspective, these changes LGTM, although I've also asked if we can get @hiroshige-g's time to review.

However, I'm unsure why we're counting MIME type mismatches as permanent. I agree they're less likely to be transient in practice, but it seems a bit strange to have one class of bad response treated so differently than all others.

nyaxt · 2024-05-08T04:21:47Z

I'm generally supportive of the proposal.

I think this change would steer the module fetching behavior more toward availability (vs determinism which was prioritized when the feature was originally designed), but that seems well-aligned with the current web author needs.

My only question is about the introduction of the callbacks. Would you mind elaborating on the context of the change (why is the old wait in parallel until that entry's value changes insufficient)?

hiroshige-g · 2024-05-09T01:49:01Z

I think the PR works consistently (at least basically) with the high-level semantics of:

Fetch-failed module requests are NOT retried within a single module graph.
Fetch-failed module requests MIGHT be retried across module graphs.

I haven't investigated the behavoir/impact on the Chromium implementation though, as the behavior/invariants in the current Chromium/V8 are different from the current spec.

Specific cases and behaviors follows below (which is probably OK?).

1: never re-request a module within a single module graph

<script type=module src=a.js> for:
a.js: import "./error.js"; import "./b.js";
error.js: network error
b.js: import "./error.js";

error.js is requested only once because:

If the network error of error.js is received before b.js, then moduleMap[error.js] is removed before b.js is received, but the module graph fetching is early terminated due to GraphLoadingState Record's [[IsLoading]] and thus error.js is NOT re-requested when b.js is received.
Otherwise, the two requests to error.js is merged by moduleMap[error.js] before the network error of error.js is received (and resolved with null).

This mechanism doesn't match with the current Chromium implementation that doesn't early-exit on failure, but probably this is implementable.

2: MIGHT re-request submodules across module graphs

<script id=script1 type=module src=a.js>
...
<script id=script2 type=module src=a.js>

for:
a.js: import "./error.js";
error.js: network error for the first time and then success for the second time.

The error.js MIGHT be re-requested:

If the a.js -> error.js loading for script1 completes before script2 is inserted to DOM, then moduleMap[error.js] is reset before script2 and thus error.js is re-requested and script2 succeeds.
Otherwise, then the two requests to error.js from script1 and script2 are merged into a single request via moduleMap[error.js] and then fail. So error.js is requested only once and script2 fails.

3: failure order not guaranteed

<script id=script1 type=module src=a.js>
<script id=script2 type=module src=b.js>

for:
a.js: import "./error.js"; import "./b.js";
error.js: network error for the first time and then success for the second time.
b.js: import "./error.js"; import "./a.js";

a.js -> error.js -> b.js: script1 fails, script2 succeeds, erros.js re-requested.
a.js -> b.js -> error.js: script1 fails, script2 fails, erros.js not re-requested.
b.js -> error.js -> a.js: script1 succeeds, script2 fails, erros.js re-requested.

4: consistency over time

I was concerned that re-requesting would break consistency/invariants and cause crashes by chaning module scripts from null to non-null over time (e.g. module graphs look different over time and causes obsolete/inconsistent pointer access and state transition).

But so far this doesn't look the case and the PR works consistently:

The result of GetImportedModule is called only when the module must be successful. So re-requesting a failed module and replacing it with a successful module wouldn't be observable via GetImportedModule.
Module loading, linking and evaluation don't proceed after the first load-failed module is encountered, so the failed module script shouldn't be referenced.

(Similar consistency/invariant checks must be done in Chromium/V8 implementation side when implementing.)

lucacasonato · 2024-06-03T15:03:50Z

@domenic I have updated the change to also not cache of mime type errors.

@nyaxt Thanks for the feedback. The change to callback was necessary to be able to accurately reason about the state of each invocation. If implemented without the callbacks, there was a race condition where between the "in parallel wait until change" and the actual read of the moduleMap, in which time the value in the module map could have changed again. Because of that, a listener may see a different value in the module map (for example it could see "pending", because a different dynamic import could have started another fetch for the same module). In that case the first dynamic import would not error after the first fetch, but instead would return the result of the second fetch.

With the callback change, the new moduleMap value for this entry, for this onCompleted call is passed explicitly to the onCompleted callback - thus removing the race condition.

@hiroshige-g Thanks for the elaborate notes. This matches my mental model.

lucacasonato force-pushed the dont_cache_dynamic_import_http_errors branch from e7b9493 to b94700d Compare May 7, 2024 12:02

lucacasonato changed the title ~~Don't cache HTTP errors in the import module cache~~ Don't cache HTTP errors in the module map May 7, 2024

annevk reviewed May 7, 2024

View reviewed changes

source Show resolved Hide resolved

source Show resolved Hide resolved

lucacasonato force-pushed the dont_cache_dynamic_import_http_errors branch from b94700d to 728c268 Compare May 7, 2024 13:10

nicolo-ribaudo reviewed May 7, 2024

View reviewed changes

annevk mentioned this pull request May 7, 2024

Failed dynamic import should not always be cached #6768

Open

lucacasonato added 2 commits May 7, 2024 17:06

address review comment

7e99d16

fix build

6176727

nicolo-ribaudo reviewed May 7, 2024

View reviewed changes

domenic reviewed May 8, 2024

View reviewed changes

Don't cache mime type errors either

9eb3c1e

nicolo-ribaudo mentioned this pull request Jun 12, 2024

[JSC] ReferenceError when multiple modules are simultaneously importing a module containing a top-level await WebKit/WebKit#24122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't cache HTTP errors in the module map #10327

Don't cache HTTP errors in the module map #10327

lucacasonato commented May 7, 2024 •

edited by pr-preview bot

Loading

annevk left a comment

nicolo-ribaudo May 7, 2024

nicolo-ribaudo commented May 7, 2024 •

edited

Loading

lucacasonato commented May 7, 2024

nicolo-ribaudo left a comment •

edited

Loading

nicolo-ribaudo commented May 7, 2024 •

edited

Loading

domenic commented May 8, 2024 •

edited

Loading

domenic left a comment •

edited

Loading

nyaxt commented May 8, 2024

hiroshige-g commented May 9, 2024

lucacasonato commented Jun 3, 2024

Don't cache HTTP errors in the module map #10327

Are you sure you want to change the base?

Don't cache HTTP errors in the module map #10327

Conversation

lucacasonato commented May 7, 2024 • edited by pr-preview bot Loading

annevk left a comment

Choose a reason for hiding this comment

nicolo-ribaudo May 7, 2024

Choose a reason for hiding this comment

nicolo-ribaudo commented May 7, 2024 • edited Loading

lucacasonato commented May 7, 2024

nicolo-ribaudo left a comment • edited Loading

Choose a reason for hiding this comment

nicolo-ribaudo commented May 7, 2024 • edited Loading

domenic commented May 8, 2024 • edited Loading

domenic left a comment • edited Loading

Choose a reason for hiding this comment

nyaxt commented May 8, 2024

hiroshige-g commented May 9, 2024

1: never re-request a module within a single module graph

2: MIGHT re-request submodules across module graphs

3: failure order not guaranteed

4: consistency over time

lucacasonato commented Jun 3, 2024

lucacasonato commented May 7, 2024 •

edited by pr-preview bot

Loading

nicolo-ribaudo commented May 7, 2024 •

edited

Loading

nicolo-ribaudo left a comment •

edited

Loading

nicolo-ribaudo commented May 7, 2024 •

edited

Loading

domenic commented May 8, 2024 •

edited

Loading

domenic left a comment •

edited

Loading