New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern for handling fingerprinted assets in cache? #657

Open
brittanystoroz opened this Issue Mar 18, 2015 · 12 comments

Comments

Projects
None yet
6 participants
@brittanystoroz
Copy link

brittanystoroz commented Mar 18, 2015

Is there any discussion around handling fingerprinted assets in the SW cache? Say your app's build process adds a fingerprint to an asset’s filename like style-3b491f3c.css. And when you update that file, the next version of it is named style-9b9c2a06.css.

I get that Service Workers would remove the need for fingerprinting files like that, but for people who already have their build process in place/want fingerprinting where SW is not supported...what might be the best pattern for handling this?

Initially I imagine you have to:

Search the cache for a "match" on your asset style-9b9c2a06.css, regardless of a matching fingerprint

If an exact match is found (the fingerprints are equal):

  1. return the cached resource and call it a day.

If an exact match isn't found, but an older version of the resource is available (style-3b491f3c.css):

  1. return this resource from the cache
  2. fetch the newer version style-9b9c2a06.css from the network for later
    3a) if the network request succeeds, add style-9b9c2a06.css to the cache, and remove the old resource style-3b491f3c.css
    3b) if the network request fails, just keep using the old version and call it a day.

If no match is found:

  1. fetch resource from the network & save it to the cache for next time

Talked with @wanderview briefly about this and he mentioned there used to be a prefixMatch to support similar use cases, but it was nixed because of complexity/inefficiency/lameness.

Perhaps solving this is decidedly outside the scope of the spec for the time being -- in which case I'll just write something to sit on top -- but I also wouldn't be surprised if this turns out to be a more common issue.

@brittanystoroz brittanystoroz changed the title Pattern for handling fingerprinted assets? Pattern for handling fingerprinted assets in cache? Mar 18, 2015

@annevk

This comment has been minimized.

Copy link
Member

annevk commented Mar 18, 2015

If you put the fingerprint in the query component of the URL you can use ignoreSearch. If you have it elsewhere you'd need to build something on top.

@brittanystoroz

This comment has been minimized.

Copy link

brittanystoroz commented Mar 18, 2015

@annevk thanks for the response -- unfortunately I was hoping to avoid doing it like that. I feel like most build processes don't construct URLs that way. (I could be wrong.) There's also 1,001 opinions about how RESTful it is to use query strings, but people seem to be less averse to the idea of avoiding them.

Regardless of how people prefer to build their URLs, if I'm going to write something to sit on top, I'd want to take as many scenarios into account as possible. Wouldn't want to force people into changing their build processes.

So ignoreSearch will help a certain percentage, but what about the others? I'm starting to look into VARY headers after some enlightenment from Ben but oh my O_O

@annevk

This comment has been minimized.

Copy link
Member

annevk commented Mar 18, 2015

If people use different strategies you'll always have to be build something generic on top, no? Have some kind of side table that tracks resource to URL mapping.

@brittanystoroz

This comment has been minimized.

Copy link

brittanystoroz commented Mar 18, 2015

Yea, this is true. As an official spec, ServiceWorkers would have the power to make a certain strategy more advantageous...but in saying that, I realize Service Workers shouldn't really be in the business of persuading URL structures. Which I guess is what it would have to do in order to cater to the initial use case I brought up.

Sigh.

@wanderview

This comment has been minimized.

Copy link
Member

wanderview commented Mar 18, 2015

I'll explain why the implementation is hard below, but if we can make this use case easier it will help developers adopt SW's faster. VARY headers are harder to configure and understand. Search URL components are not very restful and don't seem to fit what is often done today. Currently the only other option is to do cache.keys() and manually search.

I wonder if we could add a simple string "tag" to a request and then let match(request, { tag: "foo" }) only return results that also have tag="foo"? What do you think about that option @jakearchibald?


The implementation problem with prefixMatch, or anything that does a regex on the url automatically, is that its hard to achieve all of the following:

  1. Fast
  2. Uses reasonable disk space
  3. Uses reasonable runtime memory

We either have to create an index on the URL in our storage DB, which can use a lot of disk space, or we have to keep everything in memory to scan it quickly. It just kind of suck. :-(

Exact matches are not as bad because we can hash them which keeps the DB index a reasonable size.

@brittanystoroz

This comment has been minimized.

Copy link

brittanystoroz commented Mar 18, 2015

If the tag option isn't too messy to implement, I like that a lot. The problem isn't really about structuring URLs (as I was thinking earlier). It's just a matter of having Service Workers be smart enough to know that 2 files are related/loose equivalents.

If Service Workers can identify a relationship like that between files, developers can easily handle which assets to serve, determine their fallbacks, etc.

@wanderview

This comment has been minimized.

Copy link
Member

wanderview commented Mar 18, 2015

To clarify the tag idea a bit:

// new argument to associate cache-specific info in the entry.  Tag also applies for finding existing
// requests to remove during the put() algorithm.
cache.put(request, response, { tag: 'foo' });

// these do not find entry with 'foo' tag
cache.match(request);
cache.match(request, { tag: 'bar' });

// these do find the entry with 'foo' tag
cache.match(request, { tag: 'foo' });
cache.match(request, { ignoreTag: true });

// finds all entries with 'foo' tag regardless of url
cache.matchAll(null, { tag: 'foo' }));

Notably, there is no way to inspect a request or response for the tag. If content needs that, then it would have to add a header to the response object, etc.

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

jakearchibald commented Oct 28, 2015

@brittanystoroz (firstly, really sorry it's taken so long to get to this)

I get that Service Workers would remove the need for fingerprinting files like that

I don't think that's true. You can fight good http caching with SW, but it's not the optimal thing to do.

Tag feels a bit of a specific as a fix here. If we see this pattern we can bring back prefixMatch as I've run into a few cases I'd have liked to have used it.

Going to file this under future ideas.

@jakearchibald jakearchibald added this to the Future ideas milestone Oct 28, 2015

@jeffposnick

This comment has been minimized.

Copy link
Contributor

jeffposnick commented Oct 28, 2015

If there's asset fingerprinting then there's almost certainly a build process for the site. If that's the case, then modifying the server worker script during the build process is an option, and that modification can include injecting an array of the up to date fingerprinted URLs into the script file.

This also plays nicely with the service worker lifecycle, in that the modified script will kick off a new installupdate flow, letting you handle additional caching and expiration inside the standard events.

You do have to think a bit about timing in order to ensure that the cache is properly populated with the latest fingerprinted assets by the time the URLs in the controlled page are updated to refer to those assets.

That being said, I'd love to see some additional metadata, be it tag or something else, added to the Cache Storage API. My specific use case, in the sw-precache library, is related to the issue described here: resources that don't necessarily have fingerprints in their filenames, but get fingerprinted at build time, and that metadata about each file needs to be stored somewhere. The workaround I'm using is to create individual Cache objects, each with one entry and with a name that includes the fingerprint, and it's kind of ugly.

@wanderview

This comment has been minimized.

Copy link
Member

wanderview commented Oct 28, 2015

Tag feels a bit of a specific as a fix here. If we see this pattern we can bring back prefixMatch as I've run into a few cases I'd have liked to have used it.

From an implementation point of view, something like a tag would be easier to implement. prefixMatch() is going to be a de-opt which is why we removed it in the first place.

@scottohara

This comment has been minimized.

Copy link

scottohara commented Nov 11, 2015

If there's asset fingerprinting then there's almost certainly a build process for the site. If that's the case, then modifying the server worker script during the build process is an option, and that modification can include injecting an array of the up to date fingerprinted URLs into the script file.

Unless I misunderstand the point, having the build process inject an array of updated fingerprinted URLs into the serviceworker script assumes an on install strategy for cache population?

Where as I think the behaviour described by the OP is more like the stale-while-revalidate pattern.

Except that in this context, we don't get a cache hit that we can respond with, while we go out to the network to revalidate the cached asset.

Rather, we get a cache miss, and we want the serviceworker to respond with a different cache entry while we go to the network and fetch the missing resource (and later remove that cache entry once the missing resource has successfully fetched/cached).

FWIW, this was the same stumbling block I hit when looking at introducing ServiceWorker to an existing app that uses fingerprinted assets (which led me here to this issue).

Switching to a cache, falling back to network would work OK with fingerprinted assets; but you lose the ability to provide an early (but stale) response in the event of a cache miss; and there's still the question of how to cleanup stale cache entries once a newer fingerprinted version of the asset is available in the cache.

@scottohara

This comment has been minimized.

Copy link

scottohara commented Nov 19, 2015

FWIW, @jeffposnick ...just watched your talk at Chrome Dev Summit, which convinced me to look again at sw-precache. I have to say: super easy to use, and does exactly what it says on the tin.

Under Cache Storage on the DevTools Resources tab, I can see what you mean by using multiple caches, with a single entry per cache; and I agree it's not ideal...but hey, it does the job.

Nice work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment