Skip to content
This repository has been archived by the owner on May 5, 2022. It is now read-only.

Add support for cache busting via etag header #68

Closed
alcidesv opened this issue Jun 27, 2016 · 7 comments
Closed

Add support for cache busting via etag header #68

alcidesv opened this issue Jun 27, 2016 · 7 comments

Comments

@alcidesv
Copy link

+@kazuho

It would be useful to add an etag attribute to the link HTTP header and even to the HTML tag to reflect which is the most up-to-date version of an asset and to do cache busting.

In an example, instead of having

    Link: </styles.css?v=4921ef32>; rel=preload

it would be nice to have

    Link: </styles.css>; rel=preload; etag=4921ef32

This is a solution proposed by Kazuho Oku to the problem of cache busting, see his original comment at : https://lists.w3.org/Archives/Public/ietf-http-wg/2016AprJun/0419.html . For completeness, I'm pasting it here:

Considering the fact that H2 push is optional (and Cache Digests for
HTTP/2 is an option to H2 push), I am afraid it might not become the
standard way to update a fresh response.

If we are to solve cache busting, it might be better to consider
adding a validator to the link-rel-preload header. For example, we
could add a etag attribute, that presents the etag value that is
expected to be used together with the response that contained the
link header. Upon observing such header, clients can check their
cache, and in case the validators do not match, remove the response
from cache and issue a new request.

In other words,

link: </style.css>; rel=preload; etag=deadbeef

would instruct the web browser to fetch /style.css if it is not
fresh-cached, or if the etag value of the cached response does not
match the value supplied by the header.

The pros of this approach would be that it could be used over both
HTTP/1 and HTTP/2, and that it does not require the browser to
implement H2 push. The cons would be that it would only be possible to
invalidate responses as part of the preload process.

The problem of cache busting has been brought around multiple times in the discussion list, see for example the thread linked above, and this email (which happens to contain a few useful pointers):

https://lists.w3.org/Archives/Public/ietf-http-wg/2016AprJun/0416.html

@kazuho
Copy link

kazuho commented Jun 27, 2016

To be precise, pushing a validator (e.g. etag) in link-rel-preload header can have two effects, depending on the result of the validation.

i. force refetch of a fresh response

if validator specified in the etag is not equal to that of a fresh response in the client’s cache, it could act as an indicator to refetch the resource

ii. 0-RTT validation of a stale response

if validator specified in the etag is equal to that of a stale response in the client’s cache, it could have the same effect as a pushed 304; i.e. the client can treat the cached resource as fresh, thereby omitting the roundtrip for a conditional GET

The second point (which I did not mention on my original email) means that, if we adopt the proposed attribute, websites would no longer need to set expires or max-age for their asset files so that the files would be fresh for a long period. The asset files can be stored in the cache as stale, since they can be validated instantly using the etag attribute.

@drzraf
Copy link

drzraf commented Jun 30, 2016

That's really great! I didn't read the preload WD but there here are a couple of comments anyway:

  • I didn't thought about the second effect of avoiding 304 and it needs to be thought carefully, but that's a great "side-effect" too!
  • in case validators are pushed from a cached resource (if the output of the first resource (HTML) is cached), care must be taken when it comes to sub-validators it embeds
    • can a 304 response contains a Link header (AFAIK it's not prohibited)? what happens if these Link contains etag?
    • if a HTML response is shown to cached (although fresh enough via a Age header or any other mean), could we really omit checking assets the safe/usual way?
  • in case the same attribute is available via the Link header and the <link> markup:
    • what if ETag appears in the header and not in the markup
    • what if ETag appears in the markup and not in the header
    • what if ETag appears in both the header and the markup using different values

Considering that:

  • ETags are often left to webserver calculation, especially for static files and it's safe to assume that the original ETags will still be generated by the webserver
  • web-applications that would want i) or ii) (force refresh or speed up page loading by avoid 304 on every asset) would have to generate themselves the ETags and it's safe to assume that CMS can do that using HTTP headers (they usually know of all assets to link before output even starts. They offer an API to, among other things, "enqueue" them). But as of today, is more common to use HTML <Link> markup for this purpose.

Then:

  • it's unlikely that a end-user component (eg: custom themes overrides, plugins, hook) ... would care about generating Etags in a <link> HTML markup (but if they do it would makes sense to give it the priority)
  • But the possibility of ETags inside the HTML markup (as a <link> attribute) forces the UA to postpone ETags caculation/preloading after a HTML parsing (especially if every Etags inside the header can be overriden in the HTML). This would be clearly suboptimal and somehow defeat the point of preloading.

Keeping this concept only inside a HTTP header make sense performance-wise (and ease browser implementation), but current web-application mostly use the markup <Link> to indicate assets.

Worth noting that not only (dynamically generated) webpages uses (or needs) cache busting for their assets, but CSS frameworks themselves need it for their dependencies.
For example, font-awesome css file appends a query string to the .eot font-file it includes (version number updating is built inside the CSS build process they use).
In this example the HTML itself will never know nor <link> the .eot font file.

  • [joke] but then, should ETag be added to the CSS3 url( ) or @import directives?
  • So subresource/_include_d resource freshness hinting (ii) and more importantly server initiated and forced cache-invalidation (i) are great, but where should this greatness stop?

@igrigorik
Copy link
Member

It's not clear to me what the actual win here is, over a versioned + long-lived filename:

  • Perhaps aesthetically it doesn't seem as "nice" to have opaque tokens in URIs, but.. meh? It works and it works reliably in any client regardless of what other features it supports.
  • A versioned + max-ttl asset shouldn't suffer from revalidation RTT's -- note: there are related discussions and experiments around this on http-wg / immutable; we should let that play out.
  • Loading a new asset is as simple as changing filenames, with the benefit that you don't have to teach every cache along the way about additional validation tokens in Link headers.

In short, even if we were to pursue this.. I don't think it's practically deployable?

@kazuho
Copy link

kazuho commented Jun 30, 2016

@igrigorik Thank you for your response.

If we view this as a way to invalidate a fresh cache, I agree with you that there would be obstacles in deploying the method unless we could somehow convince all the browser vendors to support it.

OTOH, if we view this as a way to revalidate a stale cache in 0-RTT (see ii in #68 (comment)), then IMO it would still be useful even if not supported by all the web browsers.

My understanding is that most web developers dislike using versioned + long-lived filenames. They grudgingly use them as an optimization to reduce time spent by revalidating the resources at the same time keeping the freedom to update them. It should also be noted that the approach is not used optimally, developers often mark all of their asset files using a single version number. The number gets incremented when some of the asset files are changed, forcing the reload of all the asset files.

If the popular web browsers could support the proposed extension, it might become an attractive choice for web applications developers. They can stop using versioned + long-lived filenames to have asset files freshly cached on the client side. Instead they can have the resource files stale-cached. The files would be validated 0-RTT on the browsers that support the extension, or would spend 1 RTT on the browsers that do not (note: time spent for revalidating many asset files has greatly reduced in HTTP/2 thanks to the improved concurrency).

PS. Considering the fact that this proposal can be implemented using SW, it might be a good way to implement one and see how it goes.

@igrigorik
Copy link
Member

Apologies about the delay!

OTOH, if we view this as a way to revalidate a stale cache in 0-RTT (see ii in #68 (comment)), then IMO it would still be useful even if not supported by all the web browsers.

Yes, I guess that's true.

My understanding is that most web developers dislike using versioned + long-lived filenames.

I'm not sure about like or dislike. Rather, I think it's a practical strategy that gets you the benefit of long-caching and instant updates, which is why it's so popular.

It should also be noted that the approach is not used optimally, developers often mark all of their asset files using a single version number. The number gets incremented when some of the asset files are changed, forcing the reload of all the asset files.

True, but that's an orthogonal issue.

If the popular web browsers could support the proposed extension, it might become an attractive choice for web applications developers. The files would be validated 0-RTT on the browsers that support the extension, or would spend 1 RTT on the browsers that do not (note: time spent for revalidating many asset files has greatly reduced in HTTP/2 thanks to the improved concurrency).

My hunch is that developers won't be willing to take the penalty on browsers that don't support it; the penalty is too expensive and we can't expect support for this to happen overnight. Also, emitting an etag is its own can of worms: the server has to precompute the hash; many resources are cross-origin and it's effectively impossible to know their hash, etc.

@yoavweiss
Copy link
Contributor

If done as part of preload this seems to have security implications, e.g. it could create a "denial of refresh" attack of some third party content, maintaining it in a stale state.

I think such a mechanism, if needed, should be defined apart from preload, should be limited to content the server is authoritative for and should have thorough review of its security characteristics.

Closing as this has been open for a while without much activity, and seems out-of-scope.

@drzraf
Copy link

drzraf commented Mar 30, 2017

Where exactly exists the "denial of refresh" attack?

Link: https://third-party.com/styles.css; rel=preload; etag=4921ef32

If validator specified in the etag is equal to that of a stale response in the client’s cache: no client request is issued by the client to the 3rd party server.

Otherwise, the client fetch said resource as it may have done otherwise without the presence of the etag attribute (max-age, expires, ...). By the way it'd not hurt to limit the scope for which an etag (same domain).

NB: Is there a better place than preload for such a feature?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants