Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope matching algorithm breaks sites that don't end in a slash #1272

Open
mgiuca opened this issue Feb 2, 2018 · 32 comments
Open

Scope matching algorithm breaks sites that don't end in a slash #1272

mgiuca opened this issue Feb 2, 2018 · 32 comments

Comments

@mgiuca
Copy link

mgiuca commented Feb 2, 2018

The Match Service Worker Registration algorithm is a simple string prefix match, rather than a path segment prefix match. This means that if the SW scope does not end in a slash, you get unexpected behaviour, e.g.: if scope is "https://www.google.com/maps", you will match a (hypothetical) URL "https://www.google.com/mapsearch", which is intended to be a different product. The work-around for this issue is to always include a trailing slash in the scope.

I asked the Maps team to do this and they raised an important point: if they did that, then yes, "/mapsearch" would be correctly filtered out of the scope. But then the URL "https://www.google.com/maps" (no trailing slash) would also not be in the SW scope, and thus not hit the fetch handler. Since "https://www.google.com/maps" redirects to "https://www.google.com/maps/", this isn't a big deal, except that it breaks offline support. If the entire of "https://www.google.com/maps/" works correctly offline, but someone links to "https://www.google.com/maps", the user would need to be online to make a network request to "https://www.google.com/maps" to get a redirect to "https://www.google.com/maps/" which would then be served by the SW.

So essentially, developers are forced to make a decision between two bad choices:

  • Put a slash on the end, and show a generic offline page for the slashless version of the path, or
  • Don't put a slash on the end, and accidentally capture other paths that start with the same letters.

Is there any practical reason why this algorithm is a string prefix match, rather than a path segment prefix match? In a file system, I don't consider "/foobar/baz" to be inside the directory "/foo". Is it possible to change this behaviour now, or is it too late?

If we can't do a path segment match, as a secondary measure, can we add a rule saying if the scope ends with '/', it matches that path without scope. So scope "https://www.google.com/maps/" matches "https://www.google.com/maps" and "https://www.google.com/maps/anything", but not "https://www.google.com/mapsearch".

Note that the Web App Manifest spec has the same algorithm, and it was deliberately chosen for compatibility with the Service Worker algorithm. This is causing similar problems over there; see w3c/manifest#554; particularly this comment. I am making a similar proposal there.

@wanderview
Copy link
Member

If we can't do a path segment match, as a secondary measure, can we add a rule saying if the scope ends with '/', it matches that path without scope. So scope "https://www.google.com/maps/" matches "https://www.google.com/maps" and "https://www.google.com/maps/anything", but not "https://www.google.com/mapsearch".

This seems the most palatable and least likely to cause breakage on the web.

@domenic
Copy link
Contributor

domenic commented Feb 2, 2018

Why do people think matching on path segments would cause breakage? What would be an example breakage?

@wanderview
Copy link
Member

wanderview commented Feb 2, 2018

You can match scopes against file names or even substrings of filenames today. Making it a path comparison doesn't seem compatible with that? Or maybe I don't understand that proposal.

For example, pretty sure we have tests that set scopes "https://foo.com/path/dummy-file" and expect it to control both "dummy-file.js" and "dummy-file.html".

@wanderview
Copy link
Member

I believe we also have scopes that use query strings for uniqueness:

https://searchfox.org/mozilla-central/source/testing/web-platform/tests/service-workers/service-worker/fetch-canvas-tainting-cache.https.html#10

Probably less chance people are doing that one in the wild, though, but who knows.

@wanderview
Copy link
Member

wanderview commented Feb 2, 2018

I will also just mention the scope mechanism is kind of lame since storage and permissions are origin based. It would be much nicer if we just had maps.google.com.

Edit: For example, when I visited the google maps PWA it asked me for location permission (reasonable), but I have to grant it to all of google.com (unreasonable IMO).

@domenic
Copy link
Contributor

domenic commented Feb 2, 2018

Got it, thanks for filling in those examples for me. I guess in that sense the "secondary measure" seems like the only way to allow those cases to work while also solving the OP's problem.

I'd kind of hope people aren't using either of those patterns in the wild, but I have no data to back up that hope, and imagine that the chance of me being right is low enough that it's not worth collecting data/waiting to fix this.

(And, agreed on scopes being lame in general :(.)

@wanderview
Copy link
Member

wanderview commented Feb 2, 2018

Also, while not a great reason, changing our WPT test corpus to a path-only scoping mechanism would take quite a bit of time. We have a lot of tests that use file based scopes I think.

@jungkees
Copy link
Collaborator

jungkees commented Feb 6, 2018

I found out we had designed it as the exact-path match plus globbing. (see #287.) We dropped the globbing in favor of removing the complexity posed in the OP of that issue.

With the OP of this issue considered, I agree the "secondary measure" is a reasonable option we can take.

Any concerns about adding that condition? Or any other good ideas?

/cc @jakearchibald @slightlyoff @annevk

@mgiuca
Copy link
Author

mgiuca commented Feb 8, 2018

Ping on this proposal.

To save time reading the above, the amended proposal is to change scope matching so any scope ending with a slash ("/abc/xyz/") also matches the URL without a slash ("/abc/xyz"), but not with any suffixes after that.

So:

  • "/abc/xyz" matches "/abc/xyz/foo" and "/abc/xyz" and "/abc/xyz123".
  • "/abc/xyz/" matches "/abc/xyz/foo" and "/abc/xyz", but does not match "/abc/xyz123".

This is to allow a site like Google Maps to use the scope "https://www.google.com/maps/", which would now match "https://www.google.com/maps" but not "https://www.google.com/mapsearch". Currently there is no way to do this.

@mfalken
Copy link
Member

mfalken commented Feb 8, 2018

I understand the use case but I'm a bit reluctant to add intelligence/exceptions to the scope-matching algorithm.

It could conceivably break some assumptions in code if suddenly document URL can be shorter than the scope.

I agree the real solution is to do maps.google.com, or as a workaround can they claim all of "maps*" (i.e., move mapsearch somewhere else?)

Another workaround is two service workers: one at maps and one at maps/, and have maps redirect to maps/ (though that will incur two service worker startup costs).

@mfalken
Copy link
Member

mfalken commented Feb 8, 2018

There's also been various demand for more expressive scopes, e.g., being able to opt-in and opt-out of various patterns. #566 and #1085 are some examples. I think we should take care to have a solution that meets the big use cases instead of a quick fix now.

@mgiuca
Copy link
Author

mgiuca commented Feb 8, 2018

I agree the real solution is to do maps.google.com, or as a workaround can they claim all of "maps*" (i.e., move mapsearch somewhere else?)

There is no "mapsearch". This is just a hypothetical example of why the current options are too limited.

Note that just because this is hypothetical does not mean it isn't a problem right now. google.com/maps is real. While google.com/mapsearch is not real, the possibility of google.com/maps* in the future makes it dangerous for them to define a service worker with "google.com/maps" as the scope. But they can't define "google.com/maps/" as the scope, because that would exclude their canonical URL!

This dilemma affects 100% of applications whose scope is not "/". You could argue that they should use "maps.google.com" instead; I tend to agree, but then why does "scope" exist as a concept (why not just automatically scope SWs to the origin)?

I think we should take care to have a solution that meets the big use cases instead of a quick fix now.

I agree that it might be useful to have more expressive scoping. But the issue I'm talking about here means that "scope" as a concept is fundamentally broken. It just isn't broken too badly which is why not many people are complaining; 1. because most scopes are probably "/", and 2. because everybody else is probably defining over-broad scopes by accident, or excluding their non-slashed URL by accident, which means subtle breakage, not catastrophic. I still think this should be fixed as a priority.

@mfalken
Copy link
Member

mfalken commented Feb 8, 2018

But the issue I'm talking about here means that "scope" as a concept is fundamentally broken.

I think this might be correct. Scopes were never a universally popular thing.

I'd still like to consider the other big use cases before making a quick decision here that could limit what we do later or end up adding more complexity.

Interestingly, it looks like this was considered in issue 3: #3.

@mgiuca
Copy link
Author

mgiuca commented Feb 8, 2018

I think this might be correct. Scopes were never a universally popular thing.

That may be true, and perhaps the best thing would have been to always scope to the origin and force apps to design around that. But scopes are a thing, so they should work.

Having said that, I do think scope was a valuable feature. While it's easy to say to developers, "oh you should just use "maps.google.com" instead of "google.com/maps", the reality is that you'd be telling a gigantic organisation like Google Maps that they have to change all of their URLs before they can begin using your technology. That's going to drastically lower the the cost/benefit ratio for implementing Service Workers (and Web App Manifests). So I'd rather we fix scope, than simply tell developers, "best to design your site such that scope is /".

Interestingly, it looks like this was considered in issue 3: #3.

Interesting. That issue was closed with "Per today's f2f discussion, this is your app's responsibility."

It looks like that decision was made back in the day when "*" was still part of the syntax. Still, I don't even see then how the app could've taken responsibility for this. There is no way (with or without the * wildcard syntax) to ensure that "/foo" and "/foo/*" are in-scope, but "/foobar" is not.

@slightlyoff
Copy link
Contributor

slightlyoff commented Feb 9, 2018

Sorry for the slow response. Was discussing with others on another team here, as well as @jungkees. I can imagine a few extensions that would address the full range of issues I'm seeing:

1.) An extension for controlled scopes that marks them as "pathComponent" matches to solve the / issue. This might be an option to registrations.
2.) An extension that marks scopes as "exact"; not matching /foo/* when you only mean to handle /foo. How this and the first intersect is an interesting question.
3.) An extension for auxiliary scopes (as previously discussed, but I can't recall where.

These might come together like:

<html>
<script>
  navigator.serviceWorker.register("/sw.js", { 
    scope: "/thinger", 
    exact: true 
  }).then(...);
</script>
</html>
// sw.js
self.onactivate = (e) => {
  // Auxiliaries don't affect the registration 
  e.addAuxiliaryScope({
    // handle all navigations to `/whatevs/*` but not `/whatevslol` (e.g.)
    scope: "/whatevs",
    pathComponent: true
    // `exact` would also be legal here
  });
  // ...    
};

Thoughts?

@jungkees
Copy link
Collaborator

jungkees commented Feb 9, 2018

I thought the "secondary measure" would be good. But after seeing #1272 (comment), adding an option to opt in seems to be a safer option.

From @slightlyoff's proposal, "pathComponent" seems to be able to solve the OP issue. Would we have use cases where "exact" match is required?

@mgiuca
Copy link
Author

mgiuca commented Feb 9, 2018

@slightlyoff:

1.) An extension for controlled scopes that marks them as "pathComponent" matches to solve the / issue. This might be an option to registrations.
2.) An extension that marks scopes as "exact"; not matching /foo/* when you only mean to handle /foo. How this and the first intersect is an interesting question.

I don't understand the difference between "pathComponent" and "exact" in these two (alternative?) proposals? Seems like they both do the same thing which is "/foo" (without a slash) would only handle "/foo" and "/foo/anything" but not "/foobar".

Having it be opt-in is fine, though it adds to the complexity of correctly setting up a service worker. I'd ask that we try to find a solution that has an analogue in Web App Manifest as well, since this issue also affects Manifest.

@wanderview:

should we consider suborigins instead of expanding scope?

Do you mean force developers to migrate their app to a sub-origin rather than a path?
I addressed this here; I don't think it's going to help adoption if we force developers to migrate their entire site to a new URL. Effectively, we shut off a bunch of existing sites from adopting SW (and, by extension, Manifest).

@wanderview
Copy link
Member

I meant the sub-origins spec proposal, but I already deleted my comment because I decided the answer was probably "no".

@mgiuca
Copy link
Author

mgiuca commented Feb 9, 2018

Oh, you're referring to this? I wasn't aware of this proposal.

That would be fantastic; if we could tie SW scope, Manifest (app) scope, and perhaps permission scope and a few other things, into the same concept of a sub-origin, without forcing developers to rewrite their URL scheme. But I'm not sure what the status of this proposal is.

@jakearchibald
Copy link
Contributor

jakearchibald commented Feb 9, 2018

/foo and /foo/ are different URLs, and may serve different content. One redirecting to the other is something the server is choosing to do, it isn't a web standard. We should be mindful of that.

https://www.google.co.uk/maps doesn't redirect to https://www.google.co.uk/maps/. Both URLs serve the same content. If the page contained a relative link to, say, cat.gif, that URL would resolve to a different place on each page, as link path resolving doesn't auto-add /. Is there any web standard that does the kind of magic we're thinking about? Doing something different to links seems weird.

https://maps.google.com/ exists. It redirects to https://www.google.com/maps. https://maps.google.com/ has the same offline linking issues as https://www.google.com/maps, but none of the proposals in this thread fix that issue.

http://maps.google.com/ also exists, and doesn't appear to use HSTS.

maps.google. + loads of TLDs exist. As does https://www.google.[loads of TLDs]/maps.

There's a lot of hate for service worker scope from standards folks, but remember that it's what allowed developers to use service workers on github pages, rawgit, WPT, and /~username/ static servers. It also allows developers to have a different SW for push vs fetch. An origin-per-app is the ideal, and sub-origins are an interesting workaround where a genuinely different origin can't be used.

It feels like the best short-term solution is:

  1. Add HSTS to http://maps.google.com/, so users are redirected to HTTPS even when offline.
  2. Install a service worker at https://maps.google.com/ to redirect to https://maps.google.com/maps/. This is a good argument to bring back the Link: rel=serviceworker header, as the service worker could be installed when the URL is hit. As a workaround, an iframe pointing to https://maps.google.com/install-sw could be used on https://maps.google.com/maps/ to set this up, but this may hit double-keying complications in Safari.
  3. Install a service worker at https://maps.google.com/maps that redirects to https://maps.google.com/maps/ as long as the path exactly matches.
  4. The https://maps.google.com/maps/ service worker contains the maps logic.

If https://maps.google.com/mapsearch becomes a thing, they can install their own blank service worker if they want to avoid the startup cost of the https://maps.google.com/maps service worker.

I don't have a good solution for the multiple TLDs. To fix that we'd need a way to scope a service worker to something greater than an origin, like a cert. Ew.

The above is complicated because maps as so many URLs, across so many origins. They have maps.google. * TLDs * HTTP/HTTPS + (maps.google/maps * TLDs) + (maps.google/maps/ * TLDs). Given that, it seems kinda pointless to stress about the https://maps.google.com/maps case specifically.

Is this a mess that service worker should be trying to fix?

@davidcblack
Copy link

davidcblack commented Feb 9, 2018

The current behavior of ServiceWorker scope matching being a simple string prefix is a bigger problem than just the maps-trailing-slash example. Here's another non-hypothetical example: I want to install a simple ServiceWorker on the Google homepage - www.google.com - in order to speed it up and make some functionality available offline. However, there are a ton of other properties hosted on www.google.com that I don't want this ServiceWorker to be activated for - today any SW that handles requests to www.google.com (or www.google.com/ with the slash) must also intercept every request to www.google.com/maps, www.google.com/flights, www.google.com/search, www.google.com/preferences, and countless more. It's very much not a feasible solution to contact all the unknown number of people who own some random path off of www.google.com and ask them to install an empty ServiceWorker so we don't add latency and potential bugs to their serving path.

@slightlyoff 's proposal around allowing a SW to specify multiple paths and limiting some to the exact specified path rather than treating it as a prefix solves this problem nicely, as well as the maps-trailing-slash case, as well as some others (such as wanting to register for example.com/myapp and example.com/settings but not example.com/betatestapp), in a relatively straightforward way.

edit: reading through the comment thread again I think there may be some confusion as to the behavior of the "exact" behavior vs the "pathComponent" behavior. "/foo" as a pathComponent would match "/foo", "/foo/", and "/foo/bar/baz", but not "/foobar". On the other hand, "/foo" registered as exact would match only exactly "/foo" and not any of the others.

@mgiuca
Copy link
Author

mgiuca commented Feb 12, 2018

@jakearchibald: I think you're fixating on the Maps example too much. Yes, Maps is a mess because they have like 40 non-structurally-related URLs that all redirect to the same domain, and there's not much we can do about it. But a path without a slash being a parent of a path with a slash is universal.

/foo and /foo/ are different URLs, and may serve different content. One redirecting to the other is something the server is choosing to do, it isn't a web standard. We should be mindful of that.

Yes, they are different URLs; redirecting one to the other is a convention, not part of a standard. Note that I am not proposing that they be treated equivalently, or an automatic redirect, but a sensible containership rule.

/foo is a parent of /foo/ and /foo/*(proof: /foo/.. canonicalizes to /foo), while /foo is a sibling of /foobar. Therefore, if you are going to define a URL-based "containership", it makes sense to be able to create a containership boundary that includes /foo, /foo/ and /foo/* (all of which are equal to, or are descendents of /foo), but excludes /foobar. That currently isn't possible for SW or Manifest scope.

If this is not possible for breakage reasons, I'd like for it to at least be possible to define this boundary. I agree that making scope /foo/ include /foo would be unsatisfactory, but it may be the simplest way to design out of this hole we seem to be in. Although perhaps @slightlyoff's suggestion to simply have an opt-in to "correct" behaviour (path-component prefix, not string prefix) is better, because then it avoids the issue of slightly widening the scope, and doesn't change any existing behaviour.

Install a service worker at https://maps.google.com/maps that redirects to https://maps.google.com/maps/ as long as the path exactly matches.

Having the two service workers doesn't buy you anything at all, since you still end up with a SW that handles all URLs that start with /maps, so you still need to install a competing blank SW to cancel out the /maps one. You may as well just have your main SW scoped at /maps.

If https://maps.google.com/mapsearch becomes a thing, they can install their own blank service worker if they want to avoid the startup cost of the https://maps.google.com/maps service worker.

This is not a great solution. Again, this is not a Maps-specific quirk. This affects 100% of SWs that aren't installed at the origin root. Should all such sites be required to install a dummy SW at any paths that share a string prefix with another SW, just because the scoping rules are broken?

Also, this solution doesn't work for Web App Manifest (which suffers the same issue, on account of consistency with the SW spec), because unless the user has installed the /mapsearch app, those URLs would be handled by /maps.

Whether it's opt-in or default behaviour, I would really like to see a solution that makes it possible to define a sane scope that includes slashless paths.

@jakearchibald
Copy link
Contributor

@mgiuca

/foo is a parent of /foo/ and /foo/* (proof: /foo/.. canonicalizes to /foo)

/foo/.. canonicalizes to / I believe.

You may as well just have your main SW scoped at /maps

But then you have app logic in the SW that may also control /mapsearch. A simple redirect is less likely to be an issue.

I would really like to see a solution that makes it possible to define a sane scope

I think it's a bit much to imply the current scoping rules are 'insane'. Can we tone down the rhetoric, please?

@jakearchibald
Copy link
Contributor

(I'm not ignoring the points in the above posts. I'll try to summarise them to make sure we're all on the same page.)

@jakearchibald
Copy link
Contributor

jakearchibald commented Feb 12, 2018

Problem 1: An app is hosted at /foo/, but /foo should also work offline, ideally by being controlled by the same service worker. While the scope of the service worker could be set to /foo, that would also control /foot, which is a different site, and the logic in /foo's service worker may break this site.

Problem 2: An app hosted at / may have its own service worker. URLs such as /login/ may be part of that app and use the same service worker, whereas /foo/ is a different app, and should use its own, or no service worker. Currently a service worker scoped to / will control all pages on the origin, unless there's a service worker registration with a longer matching scope.

Problem 3: An app hosted at example.com/ is also hosted/redirected from example.co.uk/, and both URLs should work offline. Currently a service worker registration is limited to a single origin.

Is that a fair description of the problems?

Possible solution 1: Distinct apps should have their own origin, eg foo.example.com rather than example.com/foo/. This is the preferred solution as origins are the boundary of the web, and therefore the model we should be aiming for.

However, large parts of the web aren't built with this in mind, so for those sites:

Possible solution 2: Sub-origins. These allow arbitrary URLs to become part of another origin, allowing / and /login/* to be one origin, /foo and /foo/* to be another, and /foot/* to be yet another. This comes with the benefits of origin security.

However, since the sub-origin is assigned at response time, it isn't clear to me how this can work with service worker, which needs to select a controlling registration before the request.

Possible solution 3: Add secondary scopes with optional exact-matching. This way you could have service workers scoped to:

  • /foo/ (prefix) but also /foo (exact) and /foo? (prefix).
  • / (exact) but also /? (prefix) but also /login (exact) and /login? (prefix).

However, this creates big questions around the expected behaviour of getRegistration(). Also, given the mutability of secondary scopes, it creates problems around two registrations trying to have the same scope, and I imagine it'd be really tough to move a scope between service workers (eg, imagine /login/ wanted to get its own service worker).

Possible solution 4: Add an option to registration that means "match this scope URL's path component (as in, ignore search), and also match the scope URL + / as a prefix".

However, this is kinda magic, and doesn't solve problem 2. It could be used in combination with solution 3, as it works around the search component of the URL.

Also, although problem 3 is real-world and creates the same user experience issues as 1 & 2, are we happy to WONTFIX that? If so, why are we less bothered about that case?

@davidcblack
Copy link

Not sure if sub-origins are a feasible solution to the general problem, as the boundaries a developer might want between Service Workers or pages without a SW don't necessarily match up to origin boundaries. For example, for whatever reason I might want a separate SW to manage my app preferences page or script resource caching or whatever, but still want it to share storage and permissions access with my main app SW. (Not to say I'm not a fan of sub-origins, I just don't think they solve this problem in its entirety.)

Regarding solution 3, while I have no opinion on getRegistration() behavior, could one solution to two SWs trying to register for the same scope simply be that the install of the second fails with a clear and descriptive error? So long as the additional scopes are defined in register and not onactivate that seems like a straightforward and reasonably practical solution.

Really good point about migrating a scope between SWs - we didn't consider that. When we install a SW and give it a cache expry date does it automatically unregister from its scopes once its cache TTL expires? If so then we may be ok, and I think a reasonable story for migrating a scope such as /login from SW A to SW B would be something like:

  1. A site adds code to register SW A with a 1 week expiration date, claiming both "/" and "/login".
  2. time passes, people want to split "/login" into its own SW.
  3. "/" changes its register call to only register for "/" (exact match) and no longer "/login"
  4. "/login" adds code to install SW B scoped for its own path
  5. over the course of a week, some of the SW B installs fail because A still has the scope, but by the end of the week all the old As have either been re-registered or have expired from the cache, so Bs all register successfully and all users are in a good state.
    Does this make sense?

Problem 3 seems like a very hard problem. Not sure how to solve it cleanly without something like the conceptual inverse of sub-origins, where two domains can claim to be actually the same and share resources and the like, which mildly terrifies me. We've been thinking about options like a long-term cached page that is just a full-screen iframe to a canonical origin, though that has obvious latency issues. Given the difficulties here it's not clear it makes sense to tie solving this problem to solving the other problems and it might be better to consider it separately.

@mgiuca
Copy link
Author

mgiuca commented Feb 14, 2018

/foo/.. canonicalizes to / I believe.

Of course, what was I thinking. OK, I withdraw this; there is no "internal" argument (in URL syntax) that /foo is a parent of /foo/. However, I still think most developers and users would consider things inside /foo/ to be contained within /foo (i.e., /foo/ is not a sibling of /foo, unlike /foot which is).

Ultimately, this all goes back to the Unix file system, where mv /foo <dest> will move not only /foo but everything inside it. But it won't touch /foot. It would be very surprising if /foo meant the same thing as /foo*.

You may as well just have your main SW scoped at /maps

But then you have app logic in the SW that may also control /mapsearch. A simple redirect is less likely to be an issue.

From the point of view of which URLs are captured by SWs, having two SWs instead of one doesn't change things. Either way, you still need a third SW (what I'm going to call an "anti-service-worker" since it exists solely to poke a hole in the parent SW) in /mapsearch and every other path that starts with "maps".

I would really like to see a solution that makes it possible to define a sane scope

I think it's a bit much to imply the current scoping rules are 'insane'. Can we tone down the rhetoric, please?

Well, I used the word "sane", which sounded softer in tone (to me) than explicitly calling the current behaviour "insane".

Is that a fair description of the problems?

Yes, but I believe not all three problems are equally valid. I think Problem 1 is something SW scoping should handle, Problem 2 is a maybe, and Problem 3 is out of scope. I'll try to justify this, but if nothing else, as @davidcblack says, because each problem is significantly harder to solve at the spec level than the last, and if we can easily solve Problem 1, we shouldn't stop because we can't solve the others.

The reason to favour solving P1 is:

  • P1 is, I would consider, a basic expectation, that I can define a scope including a particular directory name and its children, without capturing some of its siblings.
  • P2 is perhaps a problem for certain sites (notably, Google Search whose URL is a parent of many other Google properties), but I'd say there's less of an expectation that I can define a scope that includes /foo/ and everything inside /foo/ except /foo/bar. If you need so solve P2, your URL hierarchy is "too complex".
  • P3 is a problem for a lot of sites, but I wouldn't expect a solution from the web platform. It raises basic issues of ownership. If I own "mysite.com", there's no guarantee that I own "mysite.com.au", and I wouldn't have any expectation that mysite.com could claim the scope of mysite.com.au into its SW. Also, P3 is something a developer can solve themselves (by adding a SW to the other origins), while P1 is not solvable unless you add explicit "anti-service-workers" in all of the other directories that have the same prefix.

Ironically, there seems to have at one point been a "Problem 0": defining a scope that includes several sibling directories that start with a common prefix. I consider that way down on my list of "must have" features of a scope matching algorithm, and can't think of any use case for it. Yet it seems to have been prioritized over Problem 1.

Also, although problem 3 is real-world and creates the same user experience issues as 1 & 2, are we happy to WONTFIX that? If so, why are we less bothered about that case?

I don't agree it creates the same UX issues. First of all, P3 is solvable by the developer, while P1 is not (unless they do the whole "anti-service-worker" trick, and make sure any future developers who play in the same URL space do the same).

Secondly, as a user, I barely notice the trailing slash. In fact, I think most users (who know what a URL is at all), and heck probably most developers, think that a trailing slash is semantically identical to not having it (i.e., that "google.com/maps" and "google.com/maps/" are the same URL). I was certainly under that impression for a long time. So it's quite unintuitive if a user types "google.com/maps" and it doesn't load, and then they are told, "oh, you need to add a trailing slash for it to work offline." On the other hand, anyone can tell that "maps.google.com" and "google.com/maps" are different addresses. Even if people can't tell you the difference between them, and they seem to both work the same, you can understand if you're told, "maps.google.com" is a web page that bounces you to "google.com/maps"; even though "google.com/maps" works offline, if you type "maps.google.com" you need to be online for it to work.

The possible solutions:

  • S1 is not really acceptable, as you point out.
  • S2 and S3 are really complex, introducing a bunch of gotchas and questions that you raise. They might help to solve P2 and P3, but I don't think they are crucial.
  • S4 solves P1.

However, this is kinda magic

I don't get why matching path segments is "magic". If scope always matched path segments, would you consider it magic? In other words, is the "magic" the path matching algorithm itself, or the fact that it's something the developer has to opt into?

If the former: ... why? Matching path segments is the basic way of telling if one URL (or Unix path) is a prefix of another.

If the latter: I agree, it sucks that you'd have to opt in to getting the "right" behaviour. But the ship has sailed on having the right behaviour by default. I'd rather tell web devs, "use this flag to opt in to path prefix" than "make sure every URL that ever starts with the four letters 'maps' registers a blank service worker."

Now I've finished every email to this thread mentioning Web App Manifest and nobody has responded about it. I'd like to have the same scope matching solution for both SW and Manifest. I'm happy if we come to the conclusion that they have different requirements and should therefore go in separate directions on this. But I think it would be best if there is a consistent approach for both. (Note: I am not really invested in Service Workers here; I'm just trying to ensure consistency between the two specs, since the Manifest spec explicitly cites the SW spec for why it has this behaviour.)

@jakearchibald
Copy link
Contributor

@davidcblack

When we install a SW and give it a cache expry date does it automatically unregister from its scopes once its cache TTL expires?

Nah, your service worker registration lasts, in theory, forever.

Taking the example where a SW has scopes / and /login/, and a new SW wants to control /login/, you could, on visit to /login/:

  1. Let reg be the registration with a scope matching /login/.
  2. If reg is not null, re-register reg with the same url & settings, but without the /login/ scope.
  3. Register the new service worker for /login/.

You'd need to update the register call for the / SW, and be aware that this will reject until you do so.

This is pretty hard, and it'll behave differently depending on which step happens first, so the issue may not be caught until things go live. It seems more risky, and requires more inter-app awareness than the blank service worker idea I posted earlier.

@jakearchibald
Copy link
Contributor

jakearchibald commented Feb 14, 2018

@mgiuca

Ultimately, this all goes back to the Unix file system, where mv /foo will move not only /foo but everything inside it

Yeah, but that isn't how URLs work. On the file system /foo can be a directory or a file, but you can't have a directory and file of the same name. URLs don't have the concept of directories.

you still need a third SW (what I'm going to call an "anti-service-worker" since it exists solely to poke a hole in the parent SW) in /mapsearch and every other path that starts with "maps".

I agree it isn't a pretty solution, but we should compare the proposed solutions to it.

Well, I used the word "sane", which sounded softer in tone (to me) than explicitly calling the current behaviour "insane".

Fair enough. It felt implied to me. By wanting a "sane" solution, the implication is that the current solution isn't that, therefore "insane".

Ironically, there seems to have at one point been a "Problem 0": defining a scope that includes several sibling directories that start with a common prefix. I consider that way down on my list of "must have" features of a scope matching algorithm, and can't think of any use case for it. Yet it seems to have been prioritized over Problem 1.

That was never a specific use-case. Prefix matching was chosen as it matches how URLs behave. Yes, there are conventions that treat URLs differently, but as we've seen with the maps example, some conventions are pretty weird, like serving the same content from multiple URLs.

It's also convention that URLs that end in .jpg are jpeg-encoded images. However, this is not enforced, nor is it universal. File systems offer special treatment to extensions, the web doesn't.

heck probably most developers, think that a trailing slash is semantically identical to not having it

Relative URLs behave quite differently.

So it's quite unintuitive if a user types "google.com/maps" … On the other hand, anyone can tell that "maps.google.com" and "google.com/maps" are different addresses

Is typing URLs in full a use case worth considering? I don't think it is real-world these days. Users are clicking on links, icons, or relying on autocomplete in the address bar.

When I type "maps" into my URL bar, the first autocompletion offered to me is maps.google.co.uk. There's nothing here to tell me I'm picking the wrong URL if I want to work offline.

If a user clicks a link that points to maps.google.co.uk and it fails to load offline, do you really think they'll go "ahh yes this URL is wrong, I should correct it to google.com/maps/?

I don't get why matching path segments is "magic".

Can you name another web API that treats foo as foo/? What would you call the option that enables this behaviour?

it sucks that you'd have to opt in to getting the "right" behaviour.

By "right" you mean differently to how every other web API treats URLs.

I'm a little confused that we've been presented with a series of real-life use cases, but now we're being asked to ignore some of them because a solution has been suggested that solves a subsection of the problems. That said, solution 4 seems the least problematic, and doesn't feel like it clashes with possible solutions to problem 2.

@annevk
Copy link
Member

annevk commented Feb 16, 2018

GitHub does the foo == foo/ thing too, but I'd argue that the expected behavior for them is to be siblings, not equals. And especially given that many servers have directory-based policies it seems potentially dangerous to allow for this escape (unless there's some kind of explicit opt-in as proposed by @slightlyoff; though it's not entirely clear how that works together with the header you need to specify as well).

@jakearchibald
Copy link
Contributor

jakearchibald commented Feb 16, 2018

https://jakearchibald.github.io/thing.txt and https://jakearchibald.github.io/thing.txt/ are different resources. Update: Hah, no, I was caught out by caching. Github gives priority to project names over organisations, so event though https://jakearchibald.github.io/thing.txt exists, it can't be accessed, because https://jakearchibald.github.io/thing.txt/ exists.

And especially given that many servers have directory-based policies it seems potentially dangerous to allow for this escape

Absolutely.

navigator.serviceWorker.register('/thing/sw.js', {
  treatScopeLikeFilesystemPath: true
});

The above would require a service worker that could be scoped to /thing. Unless Service-Worker-Allowed is used, the default maximum scope would be applied, which is /thing/, so it would reject.

@slightlyoff
Copy link
Contributor

As an FYI, sub-origin work has stopped in Chrome and that code is being removed. I don't expect it's going to assist us here unless Mozilla (or someone else) pick it up and champion it generally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants