Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Cache-Control: max-age= to 1 year instead of 1 month #186

Closed
getify opened this issue Jun 29, 2019 · 12 comments
Closed

Increase Cache-Control: max-age= to 1 year instead of 1 month #186

getify opened this issue Jun 29, 2019 · 12 comments
Assignees
Labels
completed Feature or request has been completed enhancement New feature or request

Comments

@getify
Copy link

getify commented Jun 29, 2019

I certainly understand not wanting to allow an option to reduce the cache length, as many would probably want. But I have the opposite request: an option to extend the caching beyond the normal (30 days?) length?

My use case is images that are stored in github repos, where the URL includes a specific git commit hash, such as this one:

https://raw.githubusercontent.com/getify/You-Dont-Know-JS/06de0535b240f0d5be8971fbb93d474f0195f767/up%20%26%20going/fig1.png

These images, at these kinds of URLs, will never change, by virtue of how git works. If they change, the URL itself changes. They could be deleted, but that's a separate issue (the in-progress feature for an API to remove from cache).

For some reason, github (via their "raw.githubusercontent" CDN) returns very short caching headers, so it ends up creating unnecessary loads if I link that image for users. I can route these URLs through your service and now there's 30-day caching headers being applied, which is much better for users.

But ideally, I could add an option like "cache-forever" or "long-cache" or something like that, where the image caching header will be 1 year (or even longer), for cases where it's known that the image will never change.

Is this something you would consider adding?

@andrieslouw andrieslouw self-assigned this Jun 29, 2019
@andrieslouw
Copy link
Member

Nah, I'm unsure if browsers will even try to keep the files that long in cache, browsers do clean up files, even when they are not expired yet. Besides, we don't want to be the one "hosting" the images. If, for some reason, the original file disappears, we need to remove it too, to prevent copyright issues.

The short caching headers for GitHub could be to negatively affect hotlinking, or to prevent people from using them to just host files in the first place.

You're free to copy our code and modify it to lengthen the cache-expiry, but it is not something that would benefit anyone as far as I can see.

@andrieslouw andrieslouw added wontfix This will not be worked on enhancement New feature or request labels Jun 29, 2019
@wei
Copy link

wei commented Jun 29, 2019

Github specific free cdn service:

  1. https://raw.githack.com
  2. https://www.jsdelivr.com/?docs=gh
  3. https://gitcdn.xyz

The first one caches for one year.

Clarification: The first one caches the CDN Url for 1 year.

Requests to CDN are routed through CloudFlare's content delivery network, and are cached for a year the first time they're loaded.

This is the header used:

cache-control: max-age=315360000

@getify
Copy link
Author

getify commented Jun 29, 2019

@andrieslouw

I'm sorry but your response is baffling to me. I can't make even a shred of sense from that reasoning. I'm surprised because it's the opposite of what I would have expected.

...unsure if browsers will even try to keep the files that long in cache, browsers do clean up files, even when they are not expired yet

It doesn't matter if the browsers keep it for 31 days, 75 days, 6 months, or a full year. The point is not keeping it for a specific length.

The point is, consistent with common web performance guidelines, we should be using "far future" caching headers for any resource which, by virtue of its name and how you manage resources, will "never change". 1 year is typically the suggested length for "far future" expires, but the particular length chosen is not that relevant.

What we're trying to do in such cases is just reduce the chances that someone's browser would re-request a resource unnecessarily. It's for the user's benefit.

As far as I can tell, CDNs typically do use longer caching headers for exactly this reason.

I would expect a service that does CDN caching like behaviors to be familiar with these "best practices", so I'm surprised and dismayed that you claim it's "not something that would benefit anyone".

But also, if you were running a CDN service (like you are) I would think you would ALSO benefit from this... in other words, the space on the drive to hold the cached resource isn't the expense you're typically worried about, but the bandwidth to serve it and how often that is requested from web users. The longer the cache, the less likely to have "extra" requests for it. Which is better for the CDN, too.

If you added an option like this, more people could extend the cache length on your service for their unchanging images, which would tend to decrease the amount of times that resource was re-requested for a user. Wouldn't that help you?

The only way long cache life works against anyone is against resources that need to be updated. That's the big tradeoff in all caches of any form. But you can know that by virtue of how these particular kinds of resources work, that is never the case, so there's zero downside and all upside.

I can't understand why something that's good for users and good for the service/CDN isn't at least something you'd want to consider? It seems like a win-win no brainer from my perspective.

Besides, we don't want to be the one "hosting" the images.

What? Again, I'm baffled here.

The whole point of your service is to provide an image caching CDN/proxy, right? I'm not looking for a hosting service, I'm looking for a caching proxy CDN for images.

A hosting service is one where I would proactively upload resources (including whole web pages), and attach a DNS record to those resources so end users visited my site from your servers.

That's not even remotely what's being suggested. Resources would only get on your server when they were being cached and served. That's not "hosting" by any reasonable definition. It's caching. So I can't understand why you'd bring up that comparison?

If, for some reason, the original file disappears, we need to remove it too, to prevent copyright issues.

I'm still shaking my head not understanding your reasoning. Why is this response specifically relevant to my request?

You have exactly the same problem with images you cache for 30 or 60 days as ones you might cache for 365 days, which is that if they get deleted (or have to be taken down for legal copyright issues), the owner might want you to purge your cache. There's nothing about 365 days that is more vulnerable to that problem than 30.

Besides, you've had an open ticket here for several years asking for an API to purge a resource from the cache, and it looks like maybe will actually happen at some point. If it did, that solves the issue you bring up, equally for a 365-day cached item as for a 30-day cached item.

So how is that a salient point against this feature request?


How did you pick default 30 days in the first place? That choice is as arbitrary as 365 would be. Browsers don't often keep files for 30 days, and files can be deleted well before 30 days have gone by.

The point of providing all the options your offer is to make your service more useful in specific scenarios. Unchanging images is a perfectly valid and common scenario (whether they were from github or not) and seems like something everyone would benefit from.

@getify getify closed this as completed Jun 29, 2019
@getify
Copy link
Author

getify commented Jun 30, 2019

@wei BTW, your info is incorrect. Those 3 aren't suitable options, and aren't better than what I thought this service was intended to be: a dedicated image caching CDN.

  1. raw.githack now only serves content with a 5 minute (300ms) cache length, not one year. Also, for images, they just redirect to the raw.githubusercontent URLs, so they don't even cache them at all.

  2. jsdelivr does appear to send a far-future expires header, but their terms of service say they're for open source projects, so they're optimized for delivering like .js files and such. not necessarily the best option for trying to CDN cache images.

  3. gitcdn.xyz does do its own caching of images, but it still serves them with a 5 minute cache length.

Moreover, even if I did use any of those services, they don't offer any of the image-specific services this one does, such as converting to a different image format based on an option parameter. The fact that this is an image caching service seemed ideal for me since I have images to cache, and would like to benefit potentially from some of the image-specific options available.

@getify
Copy link
Author

getify commented Jun 30, 2019

could be to negatively affect hotlinking

I can't see how shorter caching headers does anything to discourage hotlinking? If a party is hot-linking to a resource (which they don't have permission to do), why would they care if it was slower for users to keep re-requesting unnecessarily? The only party that harms is the user, not the offending hotlinker.

to prevent people from using them to just host files in the first place.

If github didn't want files accessible in a direct web linked format, they wouldn't have raw.githubusercontent at all in the first place. The fact that they have such a service means they understand there are valid usages for linking to web renderable resources that sit in git repos.

@andrieslouw
Copy link
Member

I appreciate the lengthy response, first and foremost, this service is free, and we try to support typical use cases. There is not much benefit in increasing from one month to one year.

From Facebook's research, browsers will keep resources around 47 hours in cache: https://code.fb.com/web/web-performance-cache-efficiency-exercise/

What is left then, is the amount of time the resource will stay on our disks, or those of CloudFlare. Due to the amount of requests, neither we, nor CloudFlare, will even try to use disks for this purpose, not even SSD's; everything is kept in RAM. This is already expensive, so there is not much room for resources, it ain't possible to keep it for a year; we serve around 10TB per day. But if the resource is much requested, we will do our best to keep it for a month, and prioritize it above other resources.

And yes, I know it's called caching, not "hosting", but the point is; we don't want to risk having files much longer (in RAM) than the origin has them available. It would mean that we actually need to respond (manually) to legal requests, whereas now, everything will solve itself within a month.

Really short caching headers will discourage hotlinking; If you harm the user, you'll harm the service being used. It is why we chose to cache in the first place; there is benefit to a month of caching, and it is something we were able to do with (relatively) limited resources.

I did notice you wanted it to be an option; but if we provide something that states it will keep the resource longer in cache, it needs to do exactly that, for typical users, with typical browsers. I will try to reconsider this, so I will look for some recent research to see if typical browsers will keep things longer than one month if requested to do so. Will also need to check our stack (the cache will be cleared almost monthly due to updates on our servers), and how long CloudFlare is willing to keep anything.

@andrieslouw andrieslouw added triage This issue is being investigated and removed wontfix This will not be worked on labels Jun 30, 2019
@getify
Copy link
Author

getify commented Jun 30, 2019

I am far less concerned with how long you (or cloudflare) keep something in RAM. That's not the point of the feature request.

The point is, the response headers sent out to the end user's browser could be lengthened, so as to indicate to the user's browser that the image at that exact URL is never going to change (since it won't).

Maybe their browser only keeps it for 47 hours, or maybe it's in their cache for 3 months. Who knows how long it's really there? But at least the headers are telling the browser, "hey, you can safely keep this for a really long time." That's the correct and semantic message to be sending for these types of images.

I certainly understand your infrastructure wanting to only physically keep things around that are frequently used and such. I'm not trying to constrain the internal implementation.

I think the CDN would want to benefit from knowing about images which claim to never change, as there's potentially different physical caching strategies that might be applicable to those vs ones which could change more frequently (maybe these are put on disk, etc).

BUT... that's all just speculative, and not relevant to the feature request I made, which is mostly concerned with the outbound response headers.

@andrieslouw
Copy link
Member

andrieslouw commented Jun 30, 2019

Discussed the topic of edge cache with CloudFlare: they will truncate resources after one month if you set anything longer than one month. So for them it doesn't matter. Testing shows that our internal cache will truncate way before one month.

Which leaves us with the browser, and the max-age we serve. I will check what is common in the industry nowadays, because it shouldn't matter if we serve one month or one year, so it may be worthwhile to just change the default for everyone, instead of introducing another parameter.

@andrieslouw andrieslouw reopened this Jun 30, 2019
@andrieslouw andrieslouw changed the title Add an option to increase cache length Increase Cache-Control: max-age= to 1 year instead of 1 month Jun 30, 2019
@andrieslouw andrieslouw added the started This issue is being worked on label Jun 30, 2019
@andrieslouw
Copy link
Member

Made some changes, please help with testing.

@andrieslouw andrieslouw removed the triage This issue is being investigated label Jun 30, 2019
@andrieslouw
Copy link
Member

@getify: I'm sorry for the confusion caused, reading back, I understood you meant to change the time we cache things (increase the time in cache), but all you wanted was to change the time of the max-age= header reported at the far end of our stack, the server your browser eventually connects to. I was really worried about the implications for the rest of our stack, which was the reason for my initial reluctance. Thank you for explaining your request further, and I'm sorry for being a "back end" type.

@getify
Copy link
Author

getify commented Jul 1, 2019

Just tried it out, and indeed the caching headers are 1 year now. Thanks so much!

@andrieslouw andrieslouw added completed Feature or request has been completed and removed started This issue is being worked on labels Aug 3, 2019
@kleisauke
Copy link
Member

The duration can now also be configured in API version 5 (#189). See:
https://images.weserv.nl/docs/format.html#cache-control

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completed Feature or request has been completed enhancement New feature or request
Development

No branches or pull requests

4 participants