Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cache HTTP errors in the module map #10327

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lucacasonato
Copy link
Member

@lucacasonato lucacasonato commented May 7, 2024

Multiple parallel imports will still only fetch once, and if the fetch
fails, will both error. A subsequent import (dynamic import, a
dynamically inserted via <script src="..." type="module">, or a static
import from either of these) to the same specifier will result in a
re-fetch if the previous fetch failed.

Fixes #6768

  • At least two implementers are interested (and none opposed):
  • Tests are written and can be reviewed and commented upon at:
  • Implementation bugs are filed:
    • Chromium: …
    • Gecko: …
    • WebKit: …
    • Deno (only for timers, structured clone, base64 utils, channel messaging, module resolution, web workers, and web storage): …
    • Node.js (only for timers, structured clone, base64 utils, channel messaging, and module resolution): …
  • MDN issue is filed: …
  • The top of this comment includes a clear commit message to use.

(See WHATWG Working Mode: Changes for more details.)


/acknowledgements.html ( diff )
/webappapis.html ( diff )

@lucacasonato lucacasonato force-pushed the dont_cache_dynamic_import_http_errors branch from e7b9493 to b94700d Compare May 7, 2024 12:02
@lucacasonato lucacasonato changed the title Don't cache HTTP errors in the import module cache Don't cache HTTP errors in the module map May 7, 2024
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me modulo nits.

cc @whatwg/modules

source Show resolved Hide resolved
source Show resolved Hide resolved
Multiple parallel imports will still only fetch once, and if the fetch
fails, will both error. A subsequent import (dynamic import, a
dynamically inserted via `<script src="..." type="module">`, or a static
import from either of these) to the same specifier will result in a
re-fetch if the previous fetch failed.
@lucacasonato lucacasonato force-pushed the dont_cache_dynamic_import_http_errors branch from b94700d to 728c268 Compare May 7, 2024 13:10
source Outdated
Comment on lines 106896 to 106898
<li><p>If <var>moduleMap</var>[(<var>url</var>, <var>moduleType</var>)] <span
data-x="map exists">exists</span>, run <var>onComplete</var> given
<var>moduleMap</var>[(<var>url</var>, <var>moduleType</var>)], and return.</p></li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle the case in which moduleMap[(url, moduleType)] exists and is "fetching", since we should not pass "fetching" to onComplete.

Test case:

await Promise.all([
  /* 1 */ import("./network-error.js").catch(() =>
            /* 3 */ import("./network-error.js").catch(() => {})
          ),
  /* 2 */ import("./network-error.js").catch(() => {})
]);

This is what I think is happening (but please somebody with better understanding of event loops confirm):

  1. /* 1 */ sets moduleMap["./network-error.js"] to "fetching", and performs the fetch. When that fetch is done, a task T1 is scheduled to the networking task source.
  2. /* 2 */ sees that moduleMap["./network-error.js"] is "fetching", and waits.
  3. At some point the event loop runs T1, which will run the processResponseConsumeBody steps of fetch a single module script:
    1. It changes moduleMap[( url , moduleType )] (it removes it), so a task T2 is scheduled to continue running /* 2 */.
    2. It runs onComplete with null, which will call FinishLoadingImportedModule with an error completion in the last step of HostLoadImportedModule.
    3. This will then reject the dynamic import promise in ContinueDynamicImport, thus queueing a microtask (in HostEnqueuePromiseJob) to run /* 1 */'s .catch() callback
    4. Once this is done, we drain the microtask queue and thus execute /* 3 */:
      1. moduleMap doen't have a "./network-error.js" entry anymore, so it sets moduleMap["./network-error.js"] to "fetching", and performs the fetch.
  4. At this point we are done running T1, and we can start running T2. moduleMap["./network-error.js"] exists (and has value "fetching"), so we run onComplete with "fetching" which is wrong.

Once this is fixed the code above should probably be used for a WPT test.

@nicolo-ribaudo
Copy link
Contributor

nicolo-ribaudo commented May 7, 2024

The current safari behavior is that in this case:

import("./network-error.js");
import("./network-error.js");

after the first import fails, it performs a second fetch for the second one (rather than deduplicating them and immediately erroring for both.)

And in this case:

import("./network-error.js");
import("./network-error.js");
import("./network-error.js");

it performs three fetches serially.

@lucacasonato
Copy link
Member Author

I have addressed @nicolo-ribaudo's review comment by removing the race condition between notification and read of the value in the module map. PTAL again @annevk.

Note that the behaviour here is not identical to Safari in the case where you a parallel dynamic import for the same specifier. In this PR, both dynamic imports will always resolve / reject with the same value. For dynamic imports started at the same time, you can't resolve one, but reject another.

Copy link
Contributor

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@nicolo-ribaudo
Copy link
Contributor

nicolo-ribaudo commented May 7, 2024

One change introduced by this PR is that now this code:

import("x").then(() => {
  console.log(1);
  Promise.resolve().then(() => console.log(2));
});
import("x").then(() => {
  console.log(3);
  Promise.resolve().then(() => console.log(4));
});

is guaranteed to log (on success) 1, 3, 2, 4.

Previously, the possible behaviors where 1, 2, 3, 4 or 3, 4, 1, 2 (which one exactly was implementation-defined I think).

I think this change is ok/good, but if we want to revert it it can be done by scheduling tasks to call the various callbacks rather than calling them directly in the loop.

@domenic
Copy link
Member

domenic commented May 8, 2024

I'm working on getting a Chromium position on this. I am hopeful it will be positive.

However, we'll likely need help with someone updating web platform tests, and adding new ones, including adding ones for the new scenario @nicolo-ribaudo points out, and for the differences between this and current Safari behavior.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an editor perspective, these changes LGTM, although I've also asked if we can get @hiroshige-g's time to review.

However, I'm unsure why we're counting MIME type mismatches as permanent. I agree they're less likely to be transient in practice, but it seems a bit strange to have one class of bad response treated so differently than all others.

@nyaxt
Copy link
Member

nyaxt commented May 8, 2024

I'm generally supportive of the proposal.

I think this change would steer the module fetching behavior more toward availability (vs determinism which was prioritized when the feature was originally designed), but that seems well-aligned with the current web author needs.

My only question is about the introduction of the callbacks. Would you mind elaborating on the context of the change (why is the old wait in parallel until that entry's value changes insufficient)?

@hiroshige-g
Copy link
Contributor

I think the PR works consistently (at least basically) with the high-level semantics of:

  1. Fetch-failed module requests are NOT retried within a single module graph.
  2. Fetch-failed module requests MIGHT be retried across module graphs.

I haven't investigated the behavoir/impact on the Chromium implementation though, as the behavior/invariants in the current Chromium/V8 are different from the current spec.

Specific cases and behaviors follows below (which is probably OK?).

1: never re-request a module within a single module graph

<script type=module src=a.js> for:
a.js: import "./error.js"; import "./b.js";
error.js: network error
b.js: import "./error.js";

error.js is requested only once because:

  • If the network error of error.js is received before b.js, then moduleMap[error.js] is removed before b.js is received, but the module graph fetching is early terminated due to GraphLoadingState Record's [[IsLoading]] and thus error.js is NOT re-requested when b.js is received.
  • Otherwise, the two requests to error.js is merged by moduleMap[error.js] before the network error of error.js is received (and resolved with null).

This mechanism doesn't match with the current Chromium implementation that doesn't early-exit on failure, but probably this is implementable.

2: MIGHT re-request submodules across module graphs

<script id=script1 type=module src=a.js>
...
<script id=script2 type=module src=a.js>

for:
a.js: import "./error.js";
error.js: network error for the first time and then success for the second time.

The error.js MIGHT be re-requested:

  • If the a.js -> error.js loading for script1 completes before script2 is inserted to DOM, then moduleMap[error.js] is reset before script2 and thus error.js is re-requested and script2 succeeds.
  • Otherwise, then the two requests to error.js from script1 and script2 are merged into a single request via moduleMap[error.js] and then fail. So error.js is requested only once and script2 fails.

3: failure order not guaranteed

<script id=script1 type=module src=a.js>
<script id=script2 type=module src=b.js>

for:
a.js: import "./error.js"; import "./b.js";
error.js: network error for the first time and then success for the second time.
b.js: import "./error.js"; import "./a.js";

  • a.js -> error.js -> b.js: script1 fails, script2 succeeds, erros.js re-requested.
  • a.js -> b.js -> error.js: script1 fails, script2 fails, erros.js not re-requested.
  • b.js -> error.js -> a.js: script1 succeeds, script2 fails, erros.js re-requested.

4: consistency over time

I was concerned that re-requesting would break consistency/invariants and cause crashes by chaning module scripts from null to non-null over time (e.g. module graphs look different over time and causes obsolete/inconsistent pointer access and state transition).

But so far this doesn't look the case and the PR works consistently:

  • The result of GetImportedModule is called only when the module must be successful. So re-requesting a failed module and replacing it with a successful module wouldn't be observable via GetImportedModule.
  • Module loading, linking and evaluation don't proceed after the first load-failed module is encountered, so the failed module script shouldn't be referenced.

(Similar consistency/invariant checks must be done in Chromium/V8 implementation side when implementing.)

@lucacasonato
Copy link
Member Author

@domenic I have updated the change to also not cache of mime type errors.


@nyaxt Thanks for the feedback. The change to callback was necessary to be able to accurately reason about the state of each invocation. If implemented without the callbacks, there was a race condition where between the "in parallel wait until change" and the actual read of the moduleMap, in which time the value in the module map could have changed again. Because of that, a listener may see a different value in the module map (for example it could see "pending", because a different dynamic import could have started another fetch for the same module). In that case the first dynamic import would not error after the first fetch, but instead would return the result of the second fetch.

With the callback change, the new moduleMap value for this entry, for this onCompleted call is passed explicitly to the onCompleted callback - thus removing the race condition.


@hiroshige-g Thanks for the elaborate notes. This matches my mental model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Failed dynamic import should not always be cached
6 participants