Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Workers are too hard to clean-up #1695

Open
Hornwitser opened this issue Oct 30, 2023 · 3 comments
Open

Service Workers are too hard to clean-up #1695

Hornwitser opened this issue Oct 30, 2023 · 3 comments

Comments

@Hornwitser
Copy link

I have noticed an upwards trend in the number of websites that are broken for me due to service workers being published on them and then left to rot in my browser's caches. A while ago I ran a query on the status codes returned by over 400 service workers registered in my Edge browser and got the following HTTP response codes:

HTTP Code Count
no response 8
200 336
301 15
302 2
400 5
403 17
404 29
418 1
500 1
502 1
504 1
Total 416

51 out of 416 service workers registered in my browser are registered on locations that now return 4xx response codes indicating the client made an error making the request. The failure mode from leaving a stale service worker on a site ranges from giving stale data to the website being completely inaccessible.

For example when I want to go buy tickets from my reginal train company sj.no I'm greeted by this

image

If you compare this to what the live site actually looks like you may notice it is very different. The ticket ordering does not work, and pretty much every page you can navigate to is not functional.

Correctly removing Service Workers is far too difficult

Service workers once registered lives on indefinitely, and will not go away until explicitly removed. However removing them is deceptively hard. If you try to search for how to uninstall a service worker you'll get stack overflow questions filled with wrong answers.

Just to illustrate how difficult it is, going back to my regional train provider sj.no they actually tried to remove their stale service worker by hosting a new at the same resources location as the old one that attempts to remove the old one.

// Content of https://www.sj.no/service-worker.js
// Deregister old PWAs
console.debug("Deregistering service workers...");
navigator.serviceWorker.getRegistrations().then(workers => {
    console.debug(workers);
    workers.forEach(worker => {
        console.debug("Deregistered service worker");
        worker.unregister();
    });
});

Unfortunately this was probably copied and pasted from misinformation and doesn't work. Edge throws the following error when it is loaded, and the old one is therefore never removed.

service-worker.js:3  Uncaught TypeError: Cannot read properties of undefined (reading 'getRegistrations')
    at service-worker.js:3:25
(anonymous) @ service-worker.js:3

It should not be this hard to remove a service worker. HTTP's 404 status code means the resource does not exist. What rationale does the Service Workers have to stay alive after the origin server says it no longer exists? How are HTTP server operators supposed to know that they can't remove a service worker, instead they need to replace it with a self destructing service worked and host that forever.

What if you take over a domain that previously hosted service workers. Are you supposed to just know that you need to configure your webserver to reply to all requests with a Service-Worker header present with a self destructing Service Worker?

Violating HTTP semantics for the sake of some sort of longevity that is now demonstrably breaking real world websites is not a good way for this to work. I see no sensible reason for why a Service Worker should keep on living if the origin server replies with a 4xx status code when it's updated. Similarly if during an update the newly fetched service worker throws an error during execution this should also be taken as a signal to remove the existing Server Worker.

Diagnosing a broken Service Worker is impossible for an end user

Imagine you're an end user that has no idea what a Service Worker is and the webpage you need to interact with is broken due to a stale Service Worker. You reload the page. You Ctrl+Reload the page. You reboot your computer. You even reinstall your browser (which then behind the scenes reused your existing profile and caches). But no matter what you do the webpage seemingly just doesn't want to work or your computer.

You then contact support who also have no idea about Service Workers and they test the site on their end and it works fine, and despite going through a bunch of steps the website is still broken for you and support can't help you. Sure a competent web developer can say you can easily remove the service worker by right clicking on the web page, select inspect, go to the application pane, click on the Service workers section and then click on unregister. But what if you're on a phone or a tablet?

Conclusion

The difficulty in correctly using Service Workers are leaving real word websites permanently broken in ways that are difficult for developers, site operators and end users to diagnose and correct. If not even Web Framework developers can get this right, (as is the case for svelte.dev) what hope does regular web developers have in getting it right?

Failing to properly implementing Service Workers should result in the browser removing the service worker rather than the website becoming inoperable. And removing a Service Worker should be as simple as removing the service worker script for the server. Requiring server operators to host a special self destructing service worker until the end of time in order for returning users to not end up with a permanently broken website is not an acceptable way for this technology to work.

@wanderview wanderview changed the title Service Workers are an abject failure Service Workers are too hard to clean-up Oct 30, 2023
@wanderview
Copy link
Member

I think there is reasonable feedback here on the difficulty of cleaning up service workers that are no longer wanted, so I changed the title to clarify that.

@asutherland
Copy link

There was some long-running discussion of this issue in #204 where #204 (comment) is the most recent decision that was made as a result of a F2F discussion.

The documented decision there was made on the basis of 2 very large sites having occasional transient problems where it would be undesirable to unregister the ServiceWorker due to a configuration hiccup that served a 4xx response, but the concern does seem universal. I think that's compatible with the observation on this issue that persistent 4xx responses do happen and are reasonable to handle. Clear-site-data has been mentioned in this context and is useful for sites that still exist, but it's still a big hammer[1] and not an option for expired domains, etc.

If we were to spec a solution to this, we could potentially do something like:

  • Each registration would maintain a "failed update count" that is incremented whenever we perform an update check and the registration was stale or some other mechanism to ensure we only increment the counter at most once a day.
  • If the counter reaches a threshold, we unregister the registration. This may trigger cleanup for things associated with the registration, but would not touch the default bucket. (Or the storage bucket the SW is associated with; the semantics described in the buckets explain's Storage buckets and service workers section only defines a relationship to clear the SW when clearing the data, not the reverse, although honestly that could be nice. But at least if the registrations were removed due to no longer existing, that would make the bucket more subject to data clearing for lack of use.
  • We could optionally let sites specify the desired number of failed updates before removal, but would have a default if not specified, allowing existing stale registrations to be removed by the spec and without requiring opt-in. I think it would be reasonable to also establish an upper bound on this value, although I would not be shocked to find there are people who would like to enable sneakernet-type use cases where the SW only updates on private networks and is otherwise inaccessible. (Although that raises questions of how such an origin could be a secure context.)

1: In particular, there isn't a way to express "clear the site data if the user's last visit/data mutation is older than date Y when we overhauled the site. It had been proposed in passing but there is nothing in the storage spec at this time to enable such functionality.

@Zipdox2
Copy link

Zipdox2 commented Mar 13, 2024

I recently had an incident where a Gatsby site ran by my friend still seemed to work despite it being offline and the domain having expired a long time ago. This caused a lot of confusion for my friend and I. It took some investigation and using Wireshark before I figured out that it was a service worker serving the site. It's understandable that this would be the case if I was offline, but I think the spec should have some kind of mechanism for detecting the site being gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants