Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24 hour spontaneous update doesn't seem specced #514

Closed
mfalken opened this issue Oct 16, 2014 · 50 comments
Closed

24 hour spontaneous update doesn't seem specced #514

mfalken opened this issue Oct 16, 2014 · 50 comments

Comments

@mfalken
Copy link
Member

mfalken commented Oct 16, 2014

A few points related to the 24 hour update:

  • There's a note that: "update() pings the server for an updated version of this script without consulting caches. This is conceptually the same operation that UA does maximum once per every 24 hours."
    But I don't see where this operation is defined.
  • The "without consulting caches" part seems wrong. update() leads to Update, which busts the browser cache only once every 24 hours.
  • The Soft Update algorithm mentions: "The user agent may call this as often as it likes to check for updates." This seems to contradict the "maximum once every 24 hours" statement (maybe that is meant to be "at least once..."?)
  • I realize that Fetch schedules a Soft Update, and Update busts the browser cache every 24 hours. But that's different than a spontaneous 24 hour update, no?
  • nit: It might be more consistent and precise to change the "forcing a network fetch if cached entry is greater than 1 day old" to "greater than 24 hours old".
@jakearchibald
Copy link
Contributor

/me sounds the horn for @annevk, @slightlyoff, @jungkees, @sicking

There seems to be a lot of confusion around this. Here's what I always thought was the intention here:

Appcache checks for updates per navigation. This works pretty well. What doesn't work well is if you set a long max-age on the manifest, it respects it, and you end up with lock-in.

With ServiceWorker we kept the update-per-navigate part, but protected against lock-in by respecting max-age (etc) but limiting it to 24 hours.

The UA may check for updates more regularly if it wishes, but it is not required to.

I've seen some threads that suggest the SW expires after 24 hours, that would be disastrous if you're offline for more than 24 hours.

@mfalken
Copy link
Member Author

mfalken commented Oct 17, 2014

I may have just been confused. I thought the spec intended to mandate a 24 hour spontaneous update.

FWIW the current Chromium implementation does as Jake explains above: update-on-navigate with 24 hour cache busting, and no (optional) spontaneous update.

The spec could still be more clear with respect to the points I mentioned above.

@jakearchibald
Copy link
Contributor

@mattto as you point out there's a couple of parts in the spec that suggest there's some kind of special 24hr auto update. I think there's just been a lot of confusion around this. Hopefully we can settle it here.

@annevk
Copy link
Member

annevk commented Oct 17, 2014

This doesn't work if you never navigate to a page but do get push messages from it. We still don't want the service worker to be older than 24h in that scenario so this seems broken.

@jakearchibald
Copy link
Contributor

Wouldn't it make more sense for the push spec to trigger updates at the appropriate time? When push messages are received, for example.

Want to avoid running updates for sites the user doesn't use/visit.

@annevk
Copy link
Member

annevk commented Oct 17, 2014

That would mean every single extension specification would have to deal with running Soft Update. That's another reason why this coupling of FetchEvent and service workers is broken.

@annevk
Copy link
Member

annevk commented Oct 17, 2014

So to be clear, that's not an acceptable solution.

@jakearchibald
Copy link
Contributor

Had a chat with the team working on Push. How about:

  • Update is called whenever the ServiceWorker spins up, if we haven't checked for updates within the previous 24hrs
  • Update is called per navigation (so many can happen within a 24hr window)
  • Specs that build on SW can call update more frequently if they wish
  • The UA can call update more frequently if it wishes
  • Updates obey max-age headers (up to a 24hr max)

@annevk
Copy link
Member

annevk commented Oct 17, 2014

If the specification actually defined the life cycle, you would do run Soft Update just before you plan on booting the service worker (for whatever reason) if the time was up (or if the UA decided it wanted to check at that point, that should also be allowed).

That way you avoid locking yourself to any particular thing such as navigation or push or some such as that just doesn't make much sense.

@jakearchibald
Copy link
Contributor

I think we're suggesting the same. (before/after doesn't matter so much, it won't wait for the update to complete before delivering events, the only state that delays events is 'activating')

@annevk
Copy link
Member

annevk commented Oct 17, 2014

Yeah I guess we do, I don't think we need

  • Update is called per navigation (so many can happen within a 24hr window)
  • Specs that build on SW can call update more frequently if they wish

Given that we already provide freedoms to UAs to update whenever. And that way we don't intertwine service worker features with the actual service worker itself.

@jakearchibald
Copy link
Contributor

I think update-on-navigation is important. If I've released a buggy version I don't want my users to be locked into 24hrs of brokenness. I want to be able to install a new version on navigation, and if need be .replace the old one.

If I truly believe in myself, I can use max-age to reduce the check frequency.

@slightlyoff
Copy link
Contributor

The UA needs to be free to check for freshness whenever it wants. The constraint in the spec is that WHEN it does this (whenever that is), it skips the cache if the previous response is more stale than 24 hours. I'm not keen to get deeper into these weeds. If there's clarification about the update logic that's necessary regarding how to skip the cache, that's great, but this was intentionally left to UAs to design policies for.

@annevk
Copy link
Member

annevk commented Oct 21, 2014

@slightlyoff are you saying it should not be required to check for freshness when navigating as @jakearchibald is suggesting? I'm getting confused.

@jakearchibald
Copy link
Contributor

We're having a ServiceWorker hack day in the office today. People are relying heavily on update checks on navigate. We can't remove this without destroying developer experience.

@KenjiBaheux
Copy link
Collaborator

Blink is implementing the behavior as it was intended. Namely,

  • spontaneous updates happen on navigation
  • we respect max-age (if it's <= 24 hours)
  • we bust the cache if age > 24 hours

Let me add "no impact (blink)" assuming that this still is the desired behavior (seems likely).
If a different conclusion is reached feel free to remove the "no impact (blink)" label so that I can file a bug against blink's implementation.

@annevk
Copy link
Member

annevk commented Oct 22, 2014

@KenjiBaheux @jakearchibald mentioned earlier that busting the cache was not desirable. If you went without connectivity for longer than 24h you'd still want to get the application.

@slightlyoff
Copy link
Contributor

@annevk: I think you misunderstand.

The intent isn't to bust the caches for navigations to an active SW, only to bust caches for background checks to see if there's an updated version of the SW script. Until (and unless) there's an updated SW which runs through it's install phase without issue, the previously active SW CONTINUES to be active and cached, meaning that the application will continue to work (including in cases where the SW update requests fail or error for some reason).

It'd be a pretty poor offline system if we pulled the rug out from under people when they need it most ;-)

Closing.

@annevk
Copy link
Member

annevk commented Oct 22, 2014

Why did you close this? It's still not actually defined.

@annevk annevk reopened this Oct 22, 2014
@michael-nordman
Copy link
Collaborator

Should the update algo have a mechanism to evict a registration for which the sw script has been removed from the server? At present, any unacceptable server response to the request for the sw script results in the existing registration remaining.

@mfalken
Copy link
Member Author

mfalken commented Oct 24, 2014

@michael-nordman I think that was decided in #204
"_Update algorithm should unregister SW on 404 and 410 errors" (the conclusion was no).

For this issue, it sounds like all that's remaining is for the spec to define auto-update more clearly?

@KenjiBaheux
Copy link
Collaborator

@annevk Sorry for the confusion. Confirming that Blink is doing what @slightlyoff says (if we don't have a successful update then we keep the existing registration. So, if you go offline and hit past the 24 hours window, there won't be any service interruption).

@sicking
Copy link

sicking commented Oct 24, 2014

Since I was asked to comment, I think we should have the following goals:

  • A developer shouldn't have to do anything in order to have a SW which updates automatically. I.e. the default policy should be a good one.
  • A developer should be able to adjust the number of requests for the SW script resource by using normal http headers.
  • However we should limit the damage of accidentally sending a really long expiration header by always checking at least every 24h for an actively used SW.
  • A developer should be able to force an update at any time by explicitly calling some API.

This is useful if the developer gets out-of-band information about a new version being available. For example a developer should be able to send a push message which then causes the SW to update itself. Or if chat app keeps a WebSocket connection for bidirectional communication, it should be able to send a command from the server to the client indicating that a new version was just released which contains a security fix which needs to be immediately downloaded.

To accomplish these goals I think we should define that:

  • We do a "update check" any time the user navigates to URL which is within the SWs scopes.
  • We do a "update check" any time a SW handles an non-fetch/non-message event, such as events for an incoming push message or geofencing message. These might not trigger a navigation so it's important that these also do update checks.
  • When we do an "update check", if we haven't actually hit the network to verify that the SW script resource is up-to-date we always hit the network. No matter what http semantics prescribe.
  • Otherwise, doing an "update check" follow normal http semantics. Potentially with the exception that we don't do heuristics based on last-modified date. I don't feel strongly about that, but I think it should be explicitly defined one way or another.
  • A call to update() should not cause a normal "update check". Instead it should always hit the network no matter what http caching headers has otherwise indicated. I.e. if the server has indicated that the expiration for the SW script is 1 hour, and we checked with the server 5 minutes ago, a call to update() should still cause a network request to verify that the script is up-to-date.
  • A call to update() should cause the server to get hit even if there's currently a SW in the installing state. I.e. even when "installing worker" is not null. Otherwise there's a risk of race where if we started installing a SW update right before a security update was released, that once the client sees that a security update has shipped and calls update(), that that becomes a no-op since we're still installing the pre-security-update version. It seems fine however to serialize and only hit the network after installation of the previous version finishes, though slightly less optimized.
  • Define that a UA has the the right to do additional "update check"s whenever it wants. But that this should generally follow the normal "update check" semantics defined above. I.e. it should generally respect http headers except when those dictate longer than 24 hour update periods. Though maybe we could also allow the UA should be allowed to bypass any http header limits if needed?

Right now very little of this seems to be defined by the spec. Even the 24 hour limit seems to only be mentioned in a non-normative note.

@jungkees jungkees added this to the Version 1 milestone Mar 23, 2015
@jungkees
Copy link
Collaborator

Thanks for review @slightlyoff! Closing.

@mfalken
Copy link
Member Author

mfalken commented Aug 13, 2015

I'm surprised that update() no longer forces bypassing the HTTP cache as https://code.google.com/p/chromium/issues/detail?id=520068 is saying. I don't recall deciding that at F2F? What was the rationale?

I would think if an author writes update(), they expect to ping the server for the latest script.

@jungkees
Copy link
Collaborator

I'm not entire sure we've talked about exactly .update() also applies the same rule, but checked with @jakearchibald it seems fine to apply the same rule to all the updates gated on Update algorithm. Now devs can adjust the cap by setting the Cache-Control max-age header themselves, and they could control this when an immediate update is needed. The reason we changed it this way is mostly consistency among the different update requests. .update({ cache-mode: "reload" }) // or "default" seems to be another way to handle this? WDYT?

@mfalken
Copy link
Member Author

mfalken commented Aug 13, 2015

Relying on setting max age doesn't sound right. You may have already set it to 100000, so by the time you realize you want it to set it to 0 to push out an emergency update it's too late.

I guess cache-mode param in update() is an OK solution... seems like unnecessary complexity to me though.

@jungkees
Copy link
Collaborator

I think I'm fine with that. We might want to consider .update() as a method that devs can explicitly ping the network resource anytime. Let's also forget about the cache-mode option.

I believe .update() is the only caller of the Update algorithm that'd always bypass cache among all the update paths for now. Do you agree?

I'm planning to add an optional force bypass cache flag in Soft Update and Update algorithms.

@mfalken
Copy link
Member Author

mfalken commented Aug 13, 2015

Sounds right, maybe @jakearchibald can jump in to confirm.

@jungkees
Copy link
Collaborator

Yeah, in reflection, exposing an option bag would come in handy for devs depending on what they want to do. /cc @jakearchibald

@jungkees
Copy link
Collaborator

Added force bypass cache flag which can be needed in any future work: 3d675eb.

@jakearchibald
Copy link
Contributor

.update() allows me to check for updates to the service worker outside of the usual schedule, but I don't see why the SW fetch should behave any differently to fetch(url) or cache.add(url) in terms of cache interaction.

I might call .update() when page visibility changes, or on a long setTimeout, but if I don't want it to http-cache it shouldn't have a max-age higher than 0 set.

You may have already set it to 100000, so by the time you realize you want it to set it to 0 to push out an emergency update it's too late

If all your resources are served from the cache, all you can do is fix the SW cache headers and wait (a maximum of) 24hrs. Even if .update ignores the cache, it's impossible to get the page to run that.

If you're not serving everything from the cache, you can call .register and give the SW url a querystring, eg 'sw.js?v2'. Although I'm not against an option to .update that causes a cache-bypass.

If I'm trying to recover from a max-age incident, would I call .update({bypassCache: true}) per page load for 24hrs? Will that cause double requests for users that didn't see a SW with a max-age?

@mfalken
Copy link
Member Author

mfalken commented Aug 17, 2015

OK Jake's explanation makes sense. I just wanted to raise the alarm since the spec long had the non-normative note "update() pings the server for an updated version of this script without consulting caches" and I didn't recall overturning that in the F2F. Sounds like Jake et al have thought it through so I'm OK with it.

@jungkees
Copy link
Collaborator

Agreed. Let's stick to what has been discussed here.

@jungkees
Copy link
Collaborator

From https://codereview.chromium.org/1283273002/, another question came up about scheduled updates by the user agents.

Do we consider the service worker's max age set by min(Cache-Control: max-age's value, 86400) a desired lifetime of a service worker? In which case, using this value for an auto-update seems fine.

On the other hand, it still sounds reasonable to use 24h rule for auto-updating a stale SW as it was. Scheduling the updates for all the installed service workers in a short timeframe only depending on the Cache-Control max-age values would be too much for the user agents I guess.

If we want to keep the 24h auto-update rule, the service worker's max age internal slot in the spec is not necessary at all. As point out by @nhiroki (https://codereview.chromium.org/1283273002/diff/20001/content/browser/service_worker/service_worker_context_request_handler.cc#newcode72), the Cache-Control max-age will cover all the cases in the network layer.

@jakearchibald
Copy link
Contributor

I'm struggling to get my head around this. What's the difference?

I don't think we want to force the browser to perform an update on an interval. Although the browser may check for updates more regularly, it only has to do it after navigation, or when other specs trigger updates (eg on push).

@jungkees
Copy link
Collaborator

I was talking about the "24 hour spontaneous update" that the ticket had started for. I think I was a bit confused about it as if the user agent had to do this spontaneous update regularly even in the service worker's max-age interval.. But it seems we don't want to specify that, right?

The current setting in the Soft Update algorithm should be sufficient I guess: "The user agent may call this as often as it likes to check for updates."

it only has to do it after navigation, or when other specs trigger updates (eg on push).

Yes, this is correct. We're doing that.

@jakearchibald
Copy link
Contributor

as if the user agent had to do this spontaneous update regularly even in the service worker's max-age interval.. But it seems we don't want to specify that, right?

Agreed, we don't want to specify that.

@jungkees jungkees reopened this Aug 26, 2015
@jungkees
Copy link
Collaborator

Re-open to discuss the criteria for bypassing the browser cache.

The registration's last update time is used for this purpose in the spec and the value is updated only when a new version is fetched from the network and the resource is new. (I.e., it doesn't care whether the subsequent installation fails. And it's not updated when the response's status was 304 - i.e. the resource in the server is still byte-for-byte match.)

On the other hand, the current chrome implementation show a bit different behavior. It has a corresponding member, |last_update_check|, on the registration, and the value is updated in two cases: 1) when the response's status was 304 or 2) the new version of the resource has been received and successfully installed. (That is, it doesn't update the value when the installation fails even if it actually fetched a new script resource from the network.)

I've been pondering upon this browser cache bypassing rule for a bit. I think the overall rule is "we must go to the network if we haven't within 24 hours". (max-age is covered by the network layer anyway.)

That considered, I'd want to suggest we just use the simplest rule where the registration's last update check time is updated if and only if the response is gotten from the network. (i.e. including 304 case; and we don't care about the subsequent installation for this new resource.) I think we can check the condition using response's cache state is not "local".

I think we should not take the installation failure into account here as, coming back to the overall rule, we already hit the network and the browser cache should have already been bumped with this new resource.

Thoughts?

/cc @jakearchibald @nhiroki @slightlyoff @wanderview @annevk

@slightlyoff
Copy link
Contributor

Install failure for a new version seems like a situation we might be able to handle independently; i.e. by retrying the install.

Regarding when we should update the timestamp, the simpler rule which doesn't pay attention to install success or failure doesn't strike me as being problematic. A site with a busted SW is the reason to have the backstop and potentially DoSing a server for which there's a failed SW install seems like a bad idea (the natural consequence of re-checking every navigation after 24 hours when the install fails).

@jungkees
Copy link
Collaborator

Addressed: 569863f. Now the last update check time value is updated when it receives a valid response upon a request that bypassed the browser cache. (the Update algorithm step 4.13.) In addition, the service worker's max age internal slot has been removed as it's not required to specify the behavior. A note is added in the Update algorithm step 4.10 instead.

@jungkees
Copy link
Collaborator

jungkees commented Sep 8, 2015

Closing.

@jungkees jungkees closed this as completed Sep 8, 2015
@wanderview
Copy link
Member

Gecko bug to implement the new update mechanism: https://bugzilla.mozilla.org/show_bug.cgi?id=1207727

@jungkees
Copy link
Collaborator

Please note that there's an ongoing discussion in Blink: https://codereview.chromium.org/1283273002/. There's a possibility we'd have to move the time stamp updating step from the Update algorithm step 4.13 to 4.24 (i.e. from before validating the response to after the validation).

/cc @wanderview

@jungkees
Copy link
Collaborator

Reopen it to address the issue encountered during the implementation as noted in the last comment.

@jungkees jungkees reopened this Sep 29, 2015
@jungkees
Copy link
Collaborator

F2F: Agreed to move the step updating the timestamp from 4.13 to 4.24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants