Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: beforePutToBFcache and afterRestoreFromBFcache events for DedicatedWorkerGlobalScope #7216

Open
hajimehoshi opened this issue Oct 14, 2021 · 38 comments

Comments

@hajimehoshi
Copy link

hajimehoshi commented Oct 14, 2021

Explainer: beforePutToBFcache and afterRestoreFromBFcache events for DedicatedWorkerGlobalScope

Authors

@hajimehoshi

Introduction

Today, browsers use an optimization feature when users navigate their browser’s history, called Back-Forward Cache (a.k.a BFCache). BFcache enables instant loading experience when users go back to a page they have recently visited.

Not every page can use this optimization. Different browsers have different heuristics that opt the pages out of BFCache when certain features are used by the web page. This feature detection also happens not only in documents but also in Web workers - so if a worker is using a feature that is not compatible with BFCache, the document might not be able to get BFCached.

In a document, web authors can listen for pagehide and pageshow events. These are window's events. pagehide is fired when a navigation happens but before the decision is made whether the page is put in BFcache. pageshow is fired when the page is restored from BFcache by a history navigation. pagehide gives web authors the opportunity to handle features that can affect BFcache. For example, a page with an IndexedDB connection might not be eligible for BFcache in some browsers. In this case, by disconnecting the connection at pagehide, the page can likely be put into BFcache in the browsers. They also can reconnect the connection at pageshow.

In a dedicated worker, there is no way to handle such lifecycle changes so far.
This means that it can be difficult to cache pages with a dedicated worker.
However, it is not feasible to add pagehide and pageshow events to DedicatedWorkerGlobalScope.
As a dedicated worker works in a different thread, the semantics of page lifecycle like pagehide and pageshow doesn't match with dedicated workers.
For example, if a dedicated worker's task lives very long, events like pagehide and pageshow might be fired during the worker's task.
In this case, it would be semantically incorrect if pagehide and pageshow are fired after the long task.

To improve this situation, this proposes to add beforePutToBFcache and afterRestoreFromBFcache events to DedicatedWorkerGlobalScope.
beforePutToBFcache is fired when the browser makes a decision whether the page should be put into BFcache.
afterRestoreFromBFcache is fired after the browser restores the page from BFcache.

Goals

This proposes to add new events beforePutToBFcache and afterRestoreFromBFcache to DedicatedWorkerGlobalScope, which gives web authors chances to observe the timing when the associated document is being put into BFcache and react to it.
These events are fired only when the dedicated worker doesn't have a shared worker or a service worker in its ancestor chain.

Non-goals

This doesn't propose to add the events to all the workers or the worklets other than a dedicated worker.
A dedicated worker, which doesn't have a shared worker or a service worker in its ancestor chain, belongs to one document so it is natural to handle a document lifecycle events in a dedicated worker. However,

  • A shared worker and a service worker are shared by multiple documents, so it is not natural to add such events to them.
    • This is the same if there is a shared worker or a service worker in a dedicated worker's ancestor chain.
  • A worklet belongs to one document like a dedicated worker, then we might be able to add the events to worklets in the future but this is not the goal of this proposal. We should revisit worklets later.

API

partial DedicatdWorkerGlobalScope : WorkerGlobalScope {
    attribute EventHandler onBeforePutToBFcache;
    attribute EventHandler onAfterRestoreFromBFcache;
}

The event beforePutToBFcache is dispatched when the page is navigated out and before unloading.
This event is dispatched before the decision is made whether the page is put into BFcache.
This is similar to the window's pagehide, but is different.
When the browser gives up putting the page into BFcache, the event is not fired.
For example, if a dedicated worker's task takes very long, the browser might give up using BFcache.

The event afterRestoreFromBFcache is dispatched when the page is restored from BFcache by a history navigation.
This is similar to window's afterRestoreFromBFcache, but is different.

The event type is PageTransitionEvent .
The persisted read-only member is always true.

Example

let db = null;

self.onBeforePutToBFcache = (e) => {
  if (e.persisted) {
    // This page is being cached.
    if (db) {
      db.close();
    }
  }
}

self.onAfterRestoreFromBFcache = (e) => {
  if (e.persisted) {
    // This page is being restored from cache.
    let req = indexedDB.open(“foo”);
    req.onsuccess = (e) => {
      db = e.result;
    };
  }
}

Discussion

Why not postMessage from a frame?

Browser side can determine whether the page is cached or not after all the beforePutToBFcache and afterRestoreFromBFcache events are handled.
postMessage just notifies the events to dedicated workers asynchronoucly, and browser cannot wait for their postMessage handlings.
It is impossible to do such determination with postMessage.

Note that navigation itself happens immediately regardless of whether the page is cached or not.

References

/CC @domenic @nhiroki @rakina @fergald

@domenic
Copy link
Member

domenic commented Oct 14, 2021

One dedicated worker is associated with one document, and a dedicated worker should follow its associated document's lifecycle.

This is unfortunately not true in the spec and in Firefox. As discussed in #6379, shared workers can own dedicated workers, and shared workers have no clear owner document.

We might just say that such shared-worker-owned dedicated workers are out of scope for this proposal, but if so we'd need to be explicit, and ensure that that doesn't have any bad impacts.

In particular it seems like some of your "non-goals" section is based specifically on this assumption, so might need to be re-thought.

the same timing

Could you be a bit more concrete about what you are proposing? In particular since they are in different threads I don't think we can guarantee any order. I guess we would post a task from the main thread into the worker thread that fires the event? Is there any ordering with relation to other interesting worker lifecycle events, or posted messages?

@hajimehoshi
Copy link
Author

Thanks!

We might just say that such shared-worker-owned dedicated workers are out of scope for this proposal, but if so we'd need to be explicit, and ensure that that doesn't have any bad impacts.

Would it makes sense to fire the events in a dedicated worker only when

  • the dedicated worker is not owned by any other workers OR
  • the dedicated worker is owned by another dedicated worker

?

Could you be a bit more concrete about what you are proposing? In particular since they are in different threads I don't think we can guarantee any order. ​I guess we would post a task from the main thread into the worker thread that fires the event? Is there any ordering with relation to other interesting worker lifecycle events, or posted messages?

Yes, I thought the events are fired from tasks that are posted from the main thread. I don't think there are any other lifecycle events in workers so far.

So probably would the sentence "the events are fired from tasks that are ported from the main thread when pagehide and pageshow are dispatched" be fine?

@wanderview
Copy link
Member

It feels weird to me for the events to be tied to the existence of a document owner (direct or indirect). I see why "persisted" would only be true if there is a doc owner, but wouldn't we want to always fire them?

Also, would these be fired for worker.terminate() or if the worker script calls self.close()?

@domenic
Copy link
Member

domenic commented Oct 15, 2021

Would it makes sense to fire the events in a dedicated worker only when

This doesn't work as stated because a dedicated worker can be owned by a dedicated worker which can be owned by a shared worker. But I assume you are trying to go for a scenario where there are no shared workers in the ancestor chain, which would work. It's just a little weird as @wanderview calls out.

I don't think there are any other lifecycle events in workers so far.

No events, but there could be posted tasks. E.g. consider

worker.postMessage("1");
window.onbeforeunload = () => worker.postMessage("2"); 
window.onpagehide = () => worker.postMessage("3");
window.onunload = () => worker.postMessage("4");

causePageToUnload();

Is there any ordering guarantee of the pagehide event inside the worker, versus the worker receiving messages 1-4?

@hajimehoshi
Copy link
Author

I see why "persisted" would only be true if there is a doc owner, but wouldn't we want to always fire them?

Do you mean that pagehide / pageshow are called for not every dedicated worker, which seems weird?

Also, would these be fired for worker.terminate() or if the worker script calls self.close()?

I don't think the events should be called in this case, as those are not related to a document's lifecycle.

But I assume you are trying to go for a scenario where there are no shared workers in the ancestor chain, which would work.

Ah yes, that's what I intended. The events are fired for a dedicated worker only when there is no shared worker in its ancestor chain.

Is there any ordering guarantee of the pagehide event inside the worker, versus the worker receiving messages 1-4?

I have never thought that... Hmm I feel like we should guarantee but I'm not familiar with postMessage's behavior. @nhiroki What do you think?

@wanderview
Copy link
Member

Do you mean that pagehide / pageshow are called for not every dedicated worker, which seems weird?

Right.

I don't think the events should be called in this case, as those are not related to a document's lifecycle.

It seems to me events fired in a worker should relate to the worker lifecycle, not to an owner that may or may not be present. Otherwise you cannot write worker script code that relies on these events without assumptions about who owns the worker.

At the core we are adding freeze/thaw type concepts to the worker lifecycle. Workers already have the lifecycle concept of "creation" and "destruction". Do those not map to "pageshow" and "pagehide" here?

If we aren't building worker lifecycle events, but just proxing document events into the worker, then what is the benefit of the platform providing that vs userland using postMessage() themselves?

@domenic
Copy link
Member

domenic commented Oct 15, 2021

It seems to me events fired in a worker should relate to the worker lifecycle, not to an owner that may or may not be present.

I think it's OK to fire events in a worker related to something else's lifecycle, especially if they are clearly named as such. The page prefix, IMO, makes it pretty clear.

If we aren't building worker lifecycle events, but just proxing document events into the worker, then what is the benefit of the platform providing that vs userland using postMessage() themselves?

I believe the proposal in the OP is indeed just proxying. It is indeed a good question why postMessage() doesn't work. My guess is because it creates a coordination problem where you need the page author to do the proxying, which makes it hard to rely on in reusable libraries or similar that want to work in a worker. But I'd love to hear more from @hajimehoshi in that regard.

@hajimehoshi
Copy link
Author

I believe the proposal in the OP is indeed just proxying. It is indeed a good question why postMessage() doesn't work

The benefit of pagehide events in dedicated workers is that the browser can decide whether the page can be cached or not after all the pagehide events are done. With postMessage, the browser cannot wait for the dedicated workers' actions before the page is cached. Does this make sense?

Note that the navigation itself can be done immediately regardless of the decision of whether the page is cached or not.

@hajimehoshi
Copy link
Author

@domenic Ping

@domenic
Copy link
Member

domenic commented Oct 19, 2021

Yes, this makes sense.

@hajimehoshi
Copy link
Author

@wanderview

At the core we are adding freeze/thaw type concepts to the worker lifecycle. Workers already have the lifecycle concept of "creation" and "destruction". Do those not map to "pageshow" and "pagehide" here?
If we aren't building worker lifecycle events, but just proxing document events into the worker, then what is the benefit of the platform providing that vs userland using postMessage() themselves?

So, with this proposal, we are just proxying the message from document events to workers for pageshow and pagehide. postMessage doesn't work as the browser side should wait for the results of the event handlers in dedicated workers before browser side determines to cache the page. Does this make sense to you?

I'll update the proposal to make this point explicit.

@hajimehoshi
Copy link
Author

Updated the explainer. Please take a look.

@wanderview
Copy link
Member

Understood. I guess it still feels weird to me that we don't have a "context closing" event for workers in general, but we do if the workers just happen to be owned by a document. I don't feel strongly enough to argue the point, though. So no objections from me.

@asutherland
Copy link

It seems like the introduction of a pagehide event requires some explicit concept of a grace period for the worker to finish what it's doing and process an explicitly dispatched new task, plus all the previously enqueued tasks that might have to run first under the existing execution model? And this seems at odds with the idea that dedicated workers can be asked to do long-running work.

ServiceWorkers do provide a precedent for letting content continue to run JS after the user has navigated away, but arguably in that case maybe the relevant app logic should just be using a ServiceWorker in which case it wouldn't be under any time pressure to drop IDB connections, etc.?

Maybe it's different for other browsers, but when Firefox freezes a page, the worker is interrupted mid-JS-execution and all content execution stops until thawed. This is based on the same mechanism for worker termination.

@fergald
Copy link

fergald commented Oct 27, 2021

@asutherland

Maybe it's different for other browsers, but when Firefox freezes a page, the worker is interrupted mid-JS-execution and all content execution stops until thawed. This is based on the same mechanism for worker termination.

For a worker not running a long task, giving the worker the chance to release resources that would block BFCaching is a win.

For a worker running a long task that would block timely execution of pagehide, we need to consider 2 cases

  1. the worker is holding resources that block BFCaching. It would not be cached with or without pagehide.
  2. the worker is not holding resources that block BFCaching.
    a. Without pagehide, we could just freeze and cache it.
    b. With pagehide, we would attempt to run pagehide and it would not run in a timely manner

So 2b here is tricky. Can we just freeze it anyway and let the pagehide run after it comes back out of BFCache and completes the long tasks? I can't think of a reason why that would be a problem. It seems odd to run pagehide after freezing and unfreezing but the worker won't know.

@asutherland
Copy link

To be clear, I'm on board with the potential benefits of letting workers clean up. My concern is how long the window for "timely execution" has to be before we start avoiding case 2b and what the performance implications of granting this grace period to every dedicated worker will be. Presumably the page the user is navigating to would benefit from having those resources for itself!

If we're regularly going to be sending too-late pagehide events for non-idle Workers that aren't written to run in very tiny time-slices, maybe it would be better for the worker constructor to gain an option like terminateOnPageHideIfNecessary that indicates that in the event that the worker would make the parent ineligible for bfcache that the worker will automatically be terminated and a terminatedByPageHide event dispatched on the Worker that was terminated. Sophisticated worker setups might run the bfcache friendly logic in the parent and the bfcache-angrifying logic in a nested worker with the flag set.

Alternately, I suppose the spec could be written so that browsers could immediately freeze all the workers until the navigated page is sufficiently loaded and the latency pressure is off from their perspective. Then the browser could thaw the workers on its own schedule and give each a longer opportunity to get to process the pagehide event and clean up whatever needs to be cleaned up. If they don't get to processing the event with this longer opportunity, the page and workers are removed from bfcache. This might even be helpful for situations involving storage APIs where the extra delay could have given them time to complete if the "pagehide" event is intentionally queued only on the thaw-for-pagehide so that the storage tasks are allowed to be queued up in the meantime.

@fergald
Copy link

fergald commented Oct 28, 2021

Thinking about the long-task-worker problem, maybe instead of pagehide/pageshow, we should be doing prepareforbfcache/resumeafterbfcache (ignore the terrible names). We would only send them if the worker is blocking BFCache. If the worker is not blocking BFCache, we can just immediately freeze it.

This removes 2 problems

  • @altimin's question of "should we send pageshow to all workers when they are created and pagehide just as they are destroyed?"
  • the issue with 2b above.

It does not solve the fact that in 1 above we would still allow a worker to consume CPU for some grace period (but that's a problem with pagehide/pageshow too).

I'm not sure that we need to spec the ordering of things. Does the current spec demand that the workers be frozen before the next page starts executing? I can imagine problems here if the user navigates to another page on the same origin and the worker is writing to shared storage but I think those problems already exist with freezing workers and resuming them after BFCache.

@annevk
Copy link
Member

annevk commented Oct 28, 2021

There is a grace period for shared and service workers and I suppose that could be extended to dedicated workers, though given how prevalent they are that does seem worrisome. And if they have a long-running task, how much is a couple extra seconds going to help?

One thing I wanted to ask is that when designing this feature we also keep shared and service workers in mind so that the solution can scale to them (as as well as any dedicated workers they might instantiate).

@fergald
Copy link

fergald commented Oct 28, 2021

@annevk can you elaborate on the grace period for shared and service workers? What are you waiting for during that grace period? Right now Chrome allows pages with service workers into the cache, it does not signal anything to the shared worker. If the worker asks for clients, it will not be told that the page exists and if it already had a handle to the page and tried to send it a message, it will be evicted.

@annevk
Copy link
Member

annevk commented Oct 28, 2021

@fergald I meant that unrelated to bfcache shared/service workers get to exist beyond the lifetime of a document for x seconds, in the event that another document appears for which they can also be used. (Firefox's implementation of these things is a bit in flux still due to site isolation. Firefox also evicts bfcache documents that receive a message.)

@hajimehoshi
Copy link
Author

So, are we fine to have dedicated events for bfcache-features rather than pagehide/pageshow, like "beingPutToBFcache" "restoredFromBfcache"?

As service workers don't prevent pages from being cached to BFcache (at least in Chrome), these new events are not needed for service workers.

I'm not sure whether shared workers should be able to treat the events. Now pages using share workers are not cached (at least in Chrome). Should shared workers' feature usage (like IndexedDB) affect the page's eligibility for BFcache in the future? When should a shared worker be frozen? As there will be a lot of discussions about them, I'd like shared workers out of scope from my proposal.

@asutherland
Copy link

So, are we fine to have dedicated events for bfcache-features rather than pagehide/pageshow, like "beingPutToBFcache" "restoredFromBfcache"?

My concern isn't so much about the name, but the steps related to dispatching "pagehide" or "beingPutToBFcache". Maybe it would be good to sketch what the spec algorithm would be for dispatching the event?

As service workers don't prevent pages from being cached to BFcache (at least in Chrome), these new events are not needed for service workers.

Yes, it seems like ServiceWorkers would not be involved with the event.

I'm not sure whether shared workers should be able to treat the events. Now pages using share workers are not cached (at least in Chrome). Should shared workers' feature usage (like IndexedDB) affect the page's eligibility for BFcache in the future? When should a shared worker be frozen? As there will be a lot of discussions about them, I'd like shared workers out of scope from my proposal.

Firefox freezes a shared worker when all of the documents in its owner set are frozen. It seems reasonable to me that a SharedWorker would receive the event when there is only one unfrozen document in its owner set and that owner is moving to be frozen. Should a frozen document in bfcache be messaged by the unfrozen SharedWorker, the document will be removed from bfcache/discarded. In Firefox (and presumably any multi-process browsers), everything involving SharedWorkers is inherently async, but should roughly look like the async handling of a dedicated worker, so I think it would be appropriate to consider SharedWorkers at the same time.

@hajimehoshi
Copy link
Author

Thanks (and sorry for terribly late reply).

Should a frozen document in bfcache be messaged by the unfrozen SharedWorker, the document will be removed from bfcache/discarded. In Firefox (and presumably any multi-process browsers), everything involving SharedWorkers is inherently async, but should roughly look like the async handling of a dedicated worker, so I think it would be appropriate to consider SharedWorkers at the same time.

I see, considering shared at the same time makes sense. Before updating the explainer, we have to consider how Chrome/Chromium caches pages with shared workers... CC @fergald

@fergald
Copy link

fergald commented Nov 10, 2021

@asutherland the distinction between "pagehide" or "beingPutToBFcache" is not just in the name. "beingPutToBFcache" means that we wouldn't fire these on first pageshow and we wouldn't fire them on pagehide if the page is not going into BFCache. They stop being about what the page is doing and instead are about what's about to happen to the worker (which is driven by what the page is doing but the distinction is far more important for shared workers).

As for shared workers in Chrome, we don't cache them currently. They account for less than .01% (1/1000) of reasons we didn't use BFCache (about half of those are blocked by something else too). Handling them would be complex, so I doubt Chrome will ever implement support for bfcaching them. Webkit removed them, so I expect only FF has them as a practical concern.

I do agree that we should sketch out the dispatch algorithm in the explainer but unless you think we are going to discover a reason to change approach entirely, I'd like to keep the scope to dedicated workers, unless someone from FF wants to collaborate. If firing an event in a shared worker is not going to work, I don't know what would, so I don't think we need to get the shared worker story correct before making progress on dedicated workers.

@asutherland
Copy link

I support the refined semantics for "beingPutToBFCache". And thank you for the explicit restatement of the semantics; those make sense and I benefited from the clarity.

In terms of the SharedWorker and the event, it sounds like we all agree that the event would work for the SharedWorker if we wanted to generalize to that/support that, and that's mainly what I wanted consensus on. Given that only Firefox would support BFCaching of documents using SharedWorkers, I think it likely makes sense to just specify that SharedWorkers make a document ineligible for BFCaching in the interest of webcompat. We can always relax that in the future should there be interest in implementing it across browsers and a belief that it would meaningfully improve successful bfcaching.

And in that case the explainer and subsequently spec only need to deal with dedicated workers. I very much look forward to the next steps of this, thank you!

@hajimehoshi hajimehoshi changed the title Proposal: pagehide and pageshow events for DedicatedWorkerGlobalScope Proposal: beforePutToBFcache and afterRestoreFromBFcache events for DedicatedWorkerGlobalScope Nov 15, 2021
@hajimehoshi
Copy link
Author

I've updated the explainer based on the discussions. I've not come up with better event name...

Please take a look, thanks!

@hajimehoshi
Copy link
Author

As there seem no objections against my proposal (except for the name?), I'll make a proposal for the spec. Thank you very much!

@hajimehoshi
Copy link
Author

Maybe it's different for other browsers, but when Firefox freezes a page, the worker is interrupted mid-JS-execution and all content execution stops until thawed. This is based on the same mechanism for worker termination.

What about Safari? Does anyone have insights?

@hajimehoshi
Copy link
Author

@rakina @domenic @nhiroki

Now I'm trying to find how to patch the current spec (whatwg), and I'd appreciate if you could give insights. My idea is:

  • When a page is going to BFcache (the same timing as pagehide), the browser may send an event beforePutToBFcache to dedicated workers.
    • It's OK if the browser sends the event to all the workers, including a worker that doesn't have a blocking feature. This might be too much but this should not be harmful. It's also OK the browser sends nothing as the event is just a kind of optimization. The page won't be put into BF if some blocking feature is used.
    • The browser might not be able to send the event due to a long-running task. It's also OK. The browser won't put the page into cache in this case, if the browser doesn't want to suspend the worker in the middle of the task.
  • When a page is being restored from BFcache (the same timing as pageshow), the browser send an event afterRestoreFromBFcache to dedicated workers when and only when beforePutToBFcache was sent.

Thanks,

@domenic
Copy link
Member

domenic commented Dec 13, 2021

I don't really understand why there are so many "mays". It sounds like that makes these events completely unreliable, and e.g. a browser that never sends them would be compliant with the spec. That doesn't seem like a useful feature to me. We would also be able to write zero web platform tests.

The original semantics as I understood it was that, whenever we would send pagehide with persisted = true to a document, we would send this new event to all dedicated workers owned by the document. I think we should be able to queue such an event regardless of blocking tasks or optimizations, so that we can have a rigorous and testable feature.

@hajimehoshi
Copy link
Author

The original semantics as I understood it was that, whenever we would send pagehide with persisted = true to a document, we would send this new event to all dedicated workers owned by the document. I think we should be able to queue such an event regardless of blocking tasks or optimizations, so that we can have a rigorous and testable feature.

When there is a long-running task in a dedicated worker, the browser might decided not to put the page into the cache without sending the event to dedicated workers. So I used the word 'may'.

I'm not sure we should have to use a 'queue' model. If the page decides not to put the page into back/forward cache, the queue is cleared in order not to send the events, right? As back/forward cache timings depends on browsers, the browser may not put the page into the cache and may clear the queue, but is this better than my suggestion (#7216 (comment))?

@fergald
Copy link

fergald commented Dec 14, 2021

I don't really understand why there are so many "mays". It sounds like that makes these events completely unreliable, and e.g. a browser that never sends them would be compliant with the spec. That doesn't seem like a useful feature to me. We would also be able to write zero web platform tests.

Does it make sense for a browser that doesn't support BFCache to send these anyway? Could be a "yes". I was going to say it could also be a "realistically there is no such browser anymore" but I'm not sure whether Edge has enabled BFCache for real yet and it seems like FF's current strategy is to non-cooperatively freeze tasks (not sure about Safari) meaning, so they would be out of spec until they decide to enable this optimisation.

Does it make sense to send this event to a worker after we have already been into and back out of BFCache? Maybe for consistency it does.

If you think the answer to those is "yes" then I think it needs a clear explanation (that we should add to the explainer).

I'm not sure I'm convinced on the testing argument. If you think of this as an optional feature, then we can write tests for Chrome that will show on WPT that we support it (and in our own CI we will fail if suddenly support is dropped) and other browsers will show in WPT that they do not support this optional feature.

It is frustrating that BFCache is introducing a lot of optionality when there was very little in WP before but I'm not sure that that's wrong.

The original semantics as I understood it was that, whenever we would send pagehide with persisted = true to a document, we would send this new event to all dedicated workers owned by the document. I think we should be able to queue such an event regardless of blocking tasks or optimizations, so that we can have a rigorous and testable feature.

I think the original semantics didn't really account for the problem of long-running tasks in workers.

@domenic
Copy link
Member

domenic commented Dec 14, 2021

When there is a long-running task in a dedicated worker, the browser might decided not to put the page into the cache without sending the event to dedicated workers. So I used the word 'may'.

OK, so the idea is:

  1. If the page is going in the bfcache, i.e. we send pagehide with persisted = true, then we definitely send the beforePutToBFCache event to all dedicated workers
  2. But if the page is not going in bfcache, i.e. we send pagehide with persisted = false, then we definitely don't send the beforePutToBFCache event to all dedicated workers

and maybe a reason for ending up in case (2) is because of a long blocking event in workers. Is that correct? If so that sounds pretty good. We can still write tests in the "if-then" format, i.e. if pagehide with persisted = true, then the event fires in the worker; if pagehide with persisted = false, then no event in the worker.

I'm not sure we should have to use a 'queue' model. If the page decides not to put the page into back/forward cache, the queue is cleared in order not to send the events, right?

Yes, if you dispose of the document, then you also shut down all of its dedicated workers, and it's possible the event won't be fired, even though it is queued.

Does it make sense for a browser that doesn't support BFCache to send these anyway?

No, that wasn't what I was trying to imply. If you are in case (2) then there is no need to send the event.

Does it make sense to send this event to a worker after we have already been into and back out of BFCache?

Interesting case. So the sequence is:

  • [Document] Go into bfcache
  • [Worker] Queue the event
  • [Document] Go out of bfcache
  • [Worker] The queue is processed and we are about to fire the event

Can we do a cross-thread check in this last step to see if the document is in bfcache or not? If so, then yeah, maybe avoiding firing the event would make sense in such cases.

@fergald
Copy link

fergald commented Dec 15, 2021

That description is a little off, I think.

Chrome's v1 of supporting dedicated workers is going to be the same as FF (freeze workers mid-task, don't BFCache if they are doing something that blocks BFCache).

Chrome's v2 would add the new event but still immediately freeze workers that are not blocking BFCache. So it would look like

for worker in page.workers:
  if HasBlockingResources(worker):
    # Hopefully the event will be delivered to the worker and it will release resources.
    QueueEventAndCooperativelyFreeze(worker)
  else:
    # Stop it using CPU right now, making the navigation to the new page smoother and giving us a better chance
    # of releasing resources in other workers
    ImmediatelyFreeze(worker)

wait(SOME_TIMEOUT)

for worker in page.workers:
  if HasBlockingResources(worker):
    # The task still has blocking resources.
    MarkNonSalvageable(page)
  if not IsFrozen(worker):
    # The task did not execute the event handler and freeze within the timeout.
    MarkNonSalvageable(page)

I think this allows for best caching and user experience but should we should spec that? FF and Safari don't currently do this and may not ever add the event. Safari currently has a bug where it doesn't freeze long-running workers (bug filed) and it looks like the intention is to just not cache (at least as the immediate fix). @cdumez.

The simplest way to express that FF, Safari and Chrome are all in spec seemed to be to make queuing the event optional which also covers the case of not running it in workers that get frozen immediately.

Can we do a cross-thread check in this last step to see if the document is in bfcache or not? If so, then yeah, maybe avoiding firing the event would make sense in such cases.

If you think we should spec it so that we say that we always queue the event but check and cancel if the page is active, that's fine. I don't think that's how we would implement it (since we know in advance whether we will cancel it and also, that cross-thread check ). The experience for devs is the same.

@asutherland
Copy link

I'm +1 on @fergald's pseudo-code in #7216 (comment) and this seems like something Firefox could/would implement.

One necessary issue to address is how to deal with nested workers. In Firefox we're currently at a point where we may be refactoring aspects of how nested workers work, so we could potentially do something where the algorithm is already able to directly communicate to the nested workers, but currently I would expect that this algorithm would need to run in turn on the workers themselves to in turn message their owned workers.

moot events

I think it would be desirable for the event to be canceled if it becomes moot by the document coming out of bfcache.

I do expect that we/Firefox would update an Atomics-style synchronized variable on the Worker binding on the main thread which the task on the worker to dispatch the event would check before actually dispatching the event. Which is mainly to say, I think we'd want to avoid having the task on the worker be able to directly have the ability to synchronously know if the document is in the bfcache directly; instead this is something its owning main-thread Worker binding would have a potentially-more-up-to-date understanding of this state than sequential task ordering allows, but the algorithm running on the worker would not be able to do anything on the main thread, just read this cached value. (And this could potentially allow for a case where an atomic is not used to convey the state, but instead conveyed via a high priority separate task queue.)

I think this could also apply to HasBlockingResources which is probably something that may not be 100% knowable from the Worker's owning thread (ex: the main thread) that's calling QueueEventAndCooperativelyFreeze unless we specify at atomic-based coordination mechanism for this too. This could be re-checked on event dispatch.

@domenic
Copy link
Member

domenic commented Jan 4, 2022

Your pseudocode is really helpful in clarifying things, and it's good to hear that @asutherland likes it too. I wonder if it'd be worth including such pseudocode in the spec as a non-normative example, if nothing else, just because it clarifies the design goal a good deal...

I agree with the general tension between allowing browsers the flexibility to do different things in these bfcache cases, versus wanting to spec something interoperable and which could be tested in a reasonable fashion. It's a hard balance.

In particular it sounds like if the intent is to not fire the event for workers without blocking resources, but instead freeze them immediately with no event, then we can't really specify anything rigorous or testable. (Unless we specify "blocking resources", which is probably not worth doing?)

I'm still trying to tease out whether we can have anything more rigorous/testable than just making the event optional. If we have to end up there, then that's OK. But it still seems like we could at least have some negative invariants expressed in the spec, such as:

  • The worker event never fires if the document has become fully active by the time it would normally fire
  • The worker event never fires if the document is deemed unsalvagable

Can you think of any other such invariants, either negative or positive?

At this point I think we might be ready for spec text though, as it's getting pretty concrete and I think I understand the constraints.

@asutherland
Copy link

I agree with the general tension between allowing browsers the flexibility to do different things in these bfcache cases, versus wanting to spec something interoperable and which could be tested in a reasonable fashion. It's a hard balance.

Maybe we could provide insight and make things testable by:

  • Defining performance timeline entries associated with the algorithm decision steps here as they relate to workers, explicitly including cases where we do not dispatch an event. API shape for monitoring new navigation-like entries such as App History, BFCache, etc w3c/performance-timeline#182 already explicitly is talking about adding entries for BFCache purposes.
  • Building on Add support for messaging between different browsing contexts web-platform-tests/wpt#29803 and related work that seems like it gives us the ability to uniquely identify a window or worker global (client) to let WPT tests be able to subscribe to performance timeline entries for a global such that even if the bfcache algorithm chooses to discard the global (and therefore the entries would not be observable to content), the test (which was driven by another global) can still see the entries.
  • Browsers' Devtools could also provide a means of letting developers obtain data even in the face of globals being discarded during normal use.
  • One could also imagine some means of letting a ServiceWorker hear about the performance entries that explain why a page couldn't make it into BFCache.

@fergald
Copy link

fergald commented Jul 19, 2022

Chrome has finished rolling out Dedicated Worker support for BFCache where we just freeze the workers. What we see is that DW now blocks about .01% of eligible navigations. These could be fixed with the event above in this explainer, however given how small this is, our priority for speccing and implementing this is going to be quite low unless we find another strong reason to push it forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants