24 hour spontaneous update doesn't seem specced #514

mfalken · 2014-10-16T08:07:05Z

A few points related to the 24 hour update:

There's a note that: "update() pings the server for an updated version of this script without consulting caches. This is conceptually the same operation that UA does maximum once per every 24 hours."
But I don't see where this operation is defined.
The "without consulting caches" part seems wrong. update() leads to Update, which busts the browser cache only once every 24 hours.
The Soft Update algorithm mentions: "The user agent may call this as often as it likes to check for updates." This seems to contradict the "maximum once every 24 hours" statement (maybe that is meant to be "at least once..."?)
I realize that Fetch schedules a Soft Update, and Update busts the browser cache every 24 hours. But that's different than a spontaneous 24 hour update, no?
nit: It might be more consistent and precise to change the "forcing a network fetch if cached entry is greater than 1 day old" to "greater than 24 hours old".

jakearchibald · 2014-10-17T07:38:38Z

/me sounds the horn for @annevk, @slightlyoff, @jungkees, @sicking

There seems to be a lot of confusion around this. Here's what I always thought was the intention here:

Appcache checks for updates per navigation. This works pretty well. What doesn't work well is if you set a long max-age on the manifest, it respects it, and you end up with lock-in.

With ServiceWorker we kept the update-per-navigate part, but protected against lock-in by respecting max-age (etc) but limiting it to 24 hours.

The UA may check for updates more regularly if it wishes, but it is not required to.

I've seen some threads that suggest the SW expires after 24 hours, that would be disastrous if you're offline for more than 24 hours.

mfalken · 2014-10-17T07:55:00Z

I may have just been confused. I thought the spec intended to mandate a 24 hour spontaneous update.

FWIW the current Chromium implementation does as Jake explains above: update-on-navigate with 24 hour cache busting, and no (optional) spontaneous update.

The spec could still be more clear with respect to the points I mentioned above.

jakearchibald · 2014-10-17T08:02:32Z

@mattto as you point out there's a couple of parts in the spec that suggest there's some kind of special 24hr auto update. I think there's just been a lot of confusion around this. Hopefully we can settle it here.

annevk · 2014-10-17T08:33:55Z

This doesn't work if you never navigate to a page but do get push messages from it. We still don't want the service worker to be older than 24h in that scenario so this seems broken.

jakearchibald · 2014-10-17T08:41:35Z

Wouldn't it make more sense for the push spec to trigger updates at the appropriate time? When push messages are received, for example.

Want to avoid running updates for sites the user doesn't use/visit.

annevk · 2014-10-17T10:49:39Z

That would mean every single extension specification would have to deal with running Soft Update. That's another reason why this coupling of FetchEvent and service workers is broken.

annevk · 2014-10-17T10:50:05Z

So to be clear, that's not an acceptable solution.

jakearchibald · 2014-10-17T11:30:12Z

Had a chat with the team working on Push. How about:

Update is called whenever the ServiceWorker spins up, if we haven't checked for updates within the previous 24hrs
Update is called per navigation (so many can happen within a 24hr window)
Specs that build on SW can call update more frequently if they wish
The UA can call update more frequently if it wishes
Updates obey max-age headers (up to a 24hr max)

annevk · 2014-10-17T11:45:18Z

If the specification actually defined the life cycle, you would do run Soft Update just before you plan on booting the service worker (for whatever reason) if the time was up (or if the UA decided it wanted to check at that point, that should also be allowed).

That way you avoid locking yourself to any particular thing such as navigation or push or some such as that just doesn't make much sense.

jakearchibald · 2014-10-17T12:10:51Z

I think we're suggesting the same. (before/after doesn't matter so much, it won't wait for the update to complete before delivering events, the only state that delays events is 'activating')

annevk · 2014-10-17T12:51:47Z

Yeah I guess we do, I don't think we need

Update is called per navigation (so many can happen within a 24hr window)
Specs that build on SW can call update more frequently if they wish

Given that we already provide freedoms to UAs to update whenever. And that way we don't intertwine service worker features with the actual service worker itself.

jakearchibald · 2014-10-17T13:37:45Z

I think update-on-navigation is important. If I've released a buggy version I don't want my users to be locked into 24hrs of brokenness. I want to be able to install a new version on navigation, and if need be .replace the old one.

If I truly believe in myself, I can use max-age to reduce the check frequency.

slightlyoff · 2014-10-20T23:54:35Z

The UA needs to be free to check for freshness whenever it wants. The constraint in the spec is that WHEN it does this (whenever that is), it skips the cache if the previous response is more stale than 24 hours. I'm not keen to get deeper into these weeds. If there's clarification about the update logic that's necessary regarding how to skip the cache, that's great, but this was intentionally left to UAs to design policies for.

annevk · 2014-10-21T08:21:21Z

@slightlyoff are you saying it should not be required to check for freshness when navigating as @jakearchibald is suggesting? I'm getting confused.

jakearchibald · 2014-10-21T13:56:36Z

We're having a ServiceWorker hack day in the office today. People are relying heavily on update checks on navigate. We can't remove this without destroying developer experience.

KenjiBaheux · 2014-10-22T06:19:17Z

Blink is implementing the behavior as it was intended. Namely,

spontaneous updates happen on navigation
we respect max-age (if it's <= 24 hours)
we bust the cache if age > 24 hours

Let me add "no impact (blink)" assuming that this still is the desired behavior (seems likely).
If a different conclusion is reached feel free to remove the "no impact (blink)" label so that I can file a bug against blink's implementation.

annevk · 2014-10-22T09:06:53Z

@KenjiBaheux @jakearchibald mentioned earlier that busting the cache was not desirable. If you went without connectivity for longer than 24h you'd still want to get the application.

slightlyoff · 2014-10-22T18:03:51Z

@annevk: I think you misunderstand.

The intent isn't to bust the caches for navigations to an active SW, only to bust caches for background checks to see if there's an updated version of the SW script. Until (and unless) there's an updated SW which runs through it's install phase without issue, the previously active SW CONTINUES to be active and cached, meaning that the application will continue to work (including in cases where the SW update requests fail or error for some reason).

It'd be a pretty poor offline system if we pulled the rug out from under people when they need it most ;-)

Closing.

annevk · 2014-10-22T18:26:13Z

Why did you close this? It's still not actually defined.

michael-nordman · 2014-10-22T20:18:31Z

Should the update algo have a mechanism to evict a registration for which the sw script has been removed from the server? At present, any unacceptable server response to the request for the sw script results in the existing registration remaining.

mfalken · 2014-10-24T01:18:08Z

@michael-nordman I think that was decided in #204
"_Update algorithm should unregister SW on 404 and 410 errors" (the conclusion was no).

For this issue, it sounds like all that's remaining is for the spec to define auto-update more clearly?

KenjiBaheux · 2014-10-24T02:28:25Z

@annevk Sorry for the confusion. Confirming that Blink is doing what @slightlyoff says (if we don't have a successful update then we keep the existing registration. So, if you go offline and hit past the 24 hours window, there won't be any service interruption).

sicking · 2014-10-24T19:15:02Z

Since I was asked to comment, I think we should have the following goals:

A developer shouldn't have to do anything in order to have a SW which updates automatically. I.e. the default policy should be a good one.
A developer should be able to adjust the number of requests for the SW script resource by using normal http headers.
However we should limit the damage of accidentally sending a really long expiration header by always checking at least every 24h for an actively used SW.
A developer should be able to force an update at any time by explicitly calling some API.

This is useful if the developer gets out-of-band information about a new version being available. For example a developer should be able to send a push message which then causes the SW to update itself. Or if chat app keeps a WebSocket connection for bidirectional communication, it should be able to send a command from the server to the client indicating that a new version was just released which contains a security fix which needs to be immediately downloaded.

To accomplish these goals I think we should define that:

We do a "update check" any time the user navigates to URL which is within the SWs scopes.
We do a "update check" any time a SW handles an non-fetch/non-message event, such as events for an incoming push message or geofencing message. These might not trigger a navigation so it's important that these also do update checks.
When we do an "update check", if we haven't actually hit the network to verify that the SW script resource is up-to-date we always hit the network. No matter what http semantics prescribe.
Otherwise, doing an "update check" follow normal http semantics. Potentially with the exception that we don't do heuristics based on last-modified date. I don't feel strongly about that, but I think it should be explicitly defined one way or another.
A call to update() should not cause a normal "update check". Instead it should always hit the network no matter what http caching headers has otherwise indicated. I.e. if the server has indicated that the expiration for the SW script is 1 hour, and we checked with the server 5 minutes ago, a call to update() should still cause a network request to verify that the script is up-to-date.
A call to update() should cause the server to get hit even if there's currently a SW in the installing state. I.e. even when "installing worker" is not null. Otherwise there's a risk of race where if we started installing a SW update right before a security update was released, that once the client sees that a security update has shipped and calls update(), that that becomes a no-op since we're still installing the pre-security-update version. It seems fine however to serialize and only hit the network after installation of the previous version finishes, though slightly less optimized.
Define that a UA has the the right to do additional "update check"s whenever it wants. But that this should generally follow the normal "update check" semantics defined above. I.e. it should generally respect http headers except when those dictate longer than 24 hour update periods. Though maybe we could also allow the UA should be allowed to bypass any http header limits if needed?

Right now very little of this seems to be defined by the spec. Even the 24 hour limit seems to only be mentioned in a non-normative note.

jungkees · 2015-07-31T06:10:34Z

Thanks for review @slightlyoff! Closing.

mfalken · 2015-08-13T00:42:55Z

I'm surprised that update() no longer forces bypassing the HTTP cache as https://code.google.com/p/chromium/issues/detail?id=520068 is saying. I don't recall deciding that at F2F? What was the rationale?

I would think if an author writes update(), they expect to ping the server for the latest script.

jungkees · 2015-08-13T01:14:14Z

I'm not entire sure we've talked about exactly .update() also applies the same rule, but checked with @jakearchibald it seems fine to apply the same rule to all the updates gated on Update algorithm. Now devs can adjust the cap by setting the Cache-Control max-age header themselves, and they could control this when an immediate update is needed. The reason we changed it this way is mostly consistency among the different update requests. .update({ cache-mode: "reload" }) // or "default" seems to be another way to handle this? WDYT?

mfalken · 2015-08-13T02:02:05Z

Relying on setting max age doesn't sound right. You may have already set it to 100000, so by the time you realize you want it to set it to 0 to push out an emergency update it's too late.

I guess cache-mode param in update() is an OK solution... seems like unnecessary complexity to me though.

jungkees · 2015-08-13T02:45:26Z

I think I'm fine with that. We might want to consider .update() as a method that devs can explicitly ping the network resource anytime. Let's also forget about the cache-mode option.

I believe .update() is the only caller of the Update algorithm that'd always bypass cache among all the update paths for now. Do you agree?

I'm planning to add an optional force bypass cache flag in Soft Update and Update algorithms.

mfalken · 2015-08-13T02:56:37Z

Sounds right, maybe @jakearchibald can jump in to confirm.

jungkees · 2015-08-13T04:20:39Z

Yeah, in reflection, exposing an option bag would come in handy for devs depending on what they want to do. /cc @jakearchibald

jungkees · 2015-08-13T04:41:34Z

Added force bypass cache flag which can be needed in any future work: 3d675eb.

jakearchibald · 2015-08-13T13:12:52Z

.update() allows me to check for updates to the service worker outside of the usual schedule, but I don't see why the SW fetch should behave any differently to fetch(url) or cache.add(url) in terms of cache interaction.

I might call .update() when page visibility changes, or on a long setTimeout, but if I don't want it to http-cache it shouldn't have a max-age higher than 0 set.

You may have already set it to 100000, so by the time you realize you want it to set it to 0 to push out an emergency update it's too late

If all your resources are served from the cache, all you can do is fix the SW cache headers and wait (a maximum of) 24hrs. Even if .update ignores the cache, it's impossible to get the page to run that.

If you're not serving everything from the cache, you can call .register and give the SW url a querystring, eg 'sw.js?v2'. Although I'm not against an option to .update that causes a cache-bypass.

If I'm trying to recover from a max-age incident, would I call .update({bypassCache: true}) per page load for 24hrs? Will that cause double requests for users that didn't see a SW with a max-age?

mfalken · 2015-08-17T05:02:57Z

OK Jake's explanation makes sense. I just wanted to raise the alarm since the spec long had the non-normative note "update() pings the server for an updated version of this script without consulting caches" and I didn't recall overturning that in the F2F. Sounds like Jake et al have thought it through so I'm OK with it.

jungkees · 2015-08-17T11:46:25Z

Agreed. Let's stick to what has been discussed here.

jungkees · 2015-08-20T02:36:16Z

From https://codereview.chromium.org/1283273002/, another question came up about scheduled updates by the user agents.

Do we consider the service worker's max age set by min(Cache-Control: max-age's value, 86400) a desired lifetime of a service worker? In which case, using this value for an auto-update seems fine.

On the other hand, it still sounds reasonable to use 24h rule for auto-updating a stale SW as it was. Scheduling the updates for all the installed service workers in a short timeframe only depending on the Cache-Control max-age values would be too much for the user agents I guess.

If we want to keep the 24h auto-update rule, the service worker's max age internal slot in the spec is not necessary at all. As point out by @nhiroki (https://codereview.chromium.org/1283273002/diff/20001/content/browser/service_worker/service_worker_context_request_handler.cc#newcode72), the Cache-Control max-age will cover all the cases in the network layer.

jakearchibald · 2015-08-20T10:11:18Z

I'm struggling to get my head around this. What's the difference?

I don't think we want to force the browser to perform an update on an interval. Although the browser may check for updates more regularly, it only has to do it after navigation, or when other specs trigger updates (eg on push).

jungkees · 2015-08-20T12:04:50Z

I was talking about the "24 hour spontaneous update" that the ticket had started for. I think I was a bit confused about it as if the user agent had to do this spontaneous update regularly even in the service worker's max-age interval.. But it seems we don't want to specify that, right?

The current setting in the Soft Update algorithm should be sufficient I guess: "The user agent may call this as often as it likes to check for updates."

it only has to do it after navigation, or when other specs trigger updates (eg on push).

Yes, this is correct. We're doing that.

jakearchibald · 2015-08-20T12:42:25Z

as if the user agent had to do this spontaneous update regularly even in the service worker's max-age interval.. But it seems we don't want to specify that, right?

Agreed, we don't want to specify that.

jungkees · 2015-08-26T06:45:37Z

Re-open to discuss the criteria for bypassing the browser cache.

The registration's last update time is used for this purpose in the spec and the value is updated only when a new version is fetched from the network and the resource is new. (I.e., it doesn't care whether the subsequent installation fails. And it's not updated when the response's status was 304 - i.e. the resource in the server is still byte-for-byte match.)

On the other hand, the current chrome implementation show a bit different behavior. It has a corresponding member, |last_update_check|, on the registration, and the value is updated in two cases: 1) when the response's status was 304 or 2) the new version of the resource has been received and successfully installed. (That is, it doesn't update the value when the installation fails even if it actually fetched a new script resource from the network.)

I've been pondering upon this browser cache bypassing rule for a bit. I think the overall rule is "we must go to the network if we haven't within 24 hours". (max-age is covered by the network layer anyway.)

That considered, I'd want to suggest we just use the simplest rule where the registration's last update check time is updated if and only if the response is gotten from the network. (i.e. including 304 case; and we don't care about the subsequent installation for this new resource.) I think we can check the condition using response's cache state is not "local".

I think we should not take the installation failure into account here as, coming back to the overall rule, we already hit the network and the browser cache should have already been bumped with this new resource.

Thoughts?

/cc @jakearchibald @nhiroki @slightlyoff @wanderview @annevk

slightlyoff · 2015-08-28T01:09:20Z

Install failure for a new version seems like a situation we might be able to handle independently; i.e. by retrying the install.

Regarding when we should update the timestamp, the simpler rule which doesn't pay attention to install success or failure doesn't strike me as being problematic. A site with a busted SW is the reason to have the backstop and potentially DoSing a server for which there's a failed SW install seems like a bad idea (the natural consequence of re-checking every navigation after 24 hours when the install fails).

jungkees · 2015-08-28T05:48:50Z

Addressed: 569863f. Now the last update check time value is updated when it receives a valid response upon a request that bypassed the browser cache. (the Update algorithm step 4.13.) In addition, the service worker's max age internal slot has been removed as it's not required to specify the behavior. A note is added in the Update algorithm step 4.10 instead.

jungkees · 2015-09-08T00:56:35Z

Closing.

wanderview · 2015-09-23T18:27:38Z

Gecko bug to implement the new update mechanism: https://bugzilla.mozilla.org/show_bug.cgi?id=1207727

jungkees · 2015-09-24T01:10:42Z

Please note that there's an ongoing discussion in Blink: https://codereview.chromium.org/1283273002/. There's a possibility we'd have to move the time stamp updating step from the Update algorithm step 4.13 to 4.24 (i.e. from before validating the response to after the validation).

/cc @wanderview

jungkees · 2015-09-29T05:23:37Z

Reopen it to address the issue encountered during the implementation as noted in the last comment.

jungkees · 2015-10-27T01:18:48Z

F2F: Agreed to move the step updating the timestamp from 4.13 to 4.24.

KenjiBaheux added the lifecycle label Oct 16, 2014

jakearchibald mentioned this issue Oct 17, 2014

API Generality #445

Closed

KenjiBaheux added the spec detail label Oct 22, 2014

KenjiBaheux added the no impact (blink) label Oct 22, 2014

slightlyoff closed this as completed Oct 22, 2014

annevk reopened this Oct 22, 2014

jungkees added this to the Version 1 milestone Mar 23, 2015

jungkees closed this as completed Jul 31, 2015

mfalken mentioned this issue Aug 5, 2015

Push events should Soft Update service worker w3c/push-api#115

Closed

jungkees reopened this Aug 26, 2015

jungkees mentioned this issue Aug 27, 2015

Expanded on the update() Note: #616

Closed

jungkees closed this as completed Sep 8, 2015

jungkees reopened this Sep 29, 2015

jungkees closed this as completed in 621d412 Oct 28, 2015

24 hour spontaneous update doesn't seem specced #514

24 hour spontaneous update doesn't seem specced #514

Comments

mfalken commented Oct 16, 2014

jakearchibald commented Oct 17, 2014

mfalken commented Oct 17, 2014

jakearchibald commented Oct 17, 2014

annevk commented Oct 17, 2014

jakearchibald commented Oct 17, 2014

annevk commented Oct 17, 2014

annevk commented Oct 17, 2014

jakearchibald commented Oct 17, 2014

annevk commented Oct 17, 2014

jakearchibald commented Oct 17, 2014

annevk commented Oct 17, 2014

jakearchibald commented Oct 17, 2014

slightlyoff commented Oct 20, 2014

annevk commented Oct 21, 2014

jakearchibald commented Oct 21, 2014

KenjiBaheux commented Oct 22, 2014

annevk commented Oct 22, 2014

slightlyoff commented Oct 22, 2014

annevk commented Oct 22, 2014

michael-nordman commented Oct 22, 2014

mfalken commented Oct 24, 2014

KenjiBaheux commented Oct 24, 2014

sicking commented Oct 24, 2014

jungkees commented Jul 31, 2015

mfalken commented Aug 13, 2015

jungkees commented Aug 13, 2015

mfalken commented Aug 13, 2015

jungkees commented Aug 13, 2015

mfalken commented Aug 13, 2015

jungkees commented Aug 13, 2015

jungkees commented Aug 13, 2015

jakearchibald commented Aug 13, 2015

mfalken commented Aug 17, 2015

jungkees commented Aug 17, 2015

jungkees commented Aug 20, 2015

jakearchibald commented Aug 20, 2015

jungkees commented Aug 20, 2015

jakearchibald commented Aug 20, 2015

jungkees commented Aug 26, 2015

slightlyoff commented Aug 28, 2015

jungkees commented Aug 28, 2015

jungkees commented Sep 8, 2015

wanderview commented Sep 23, 2015

jungkees commented Sep 24, 2015

jungkees commented Sep 29, 2015

jungkees commented Oct 27, 2015