Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: mechanism to clear-cache on downstream change #234

Closed
jacksontj opened this issue Jun 8, 2019 · 4 comments
Closed

Proposal: mechanism to clear-cache on downstream change #234

jacksontj opened this issue Jun 8, 2019 · 4 comments
Assignees
Labels
1.0 release Feature/Fix will be available in 1.0 Release caching affects the caching layer delta-proxy affects the delta proxy+cache enhancement New feature or request

Comments

@jacksontj
Copy link
Contributor

As it stands today trickster simply caches the responses of the downstream prometheus API. If the downstream's configuration is changed in such a way that the data behind it changes-- trickster will continue to serve "stale" data. Here are 2 example cases to highlight the problem

(1) remote_read on a single prometheus host
If the prometheus host was configured to start pulling data from a remote_read endpoint all data in the trickster cache would be missing the "new" data from the remote_read endpoint.

(2) promxy downstreams change
As these systems scale more they get more complex, a great example here is promxy. TLDR (for those that need context) promxy stitches multiple prometheus hosts data together (single API endpoint as well as "stitching" together timeseries with holes). So to set the scene:

  • promxy is configured to talk to 2 prometheus hosts configured to scrape the same targets
  • host 1 is missing data for a period, but promxy is stitching data with host 2 to fill the gaps
  • if host 2 were to become unavailable (restart, host dies, etc.) then the data promxy returns would have "holes" in it

This fundamentally is a distributed caching problem -- since the source data isn't static. So instead of inventing a new solution, I propose we use HTTP cache headers. Specifically I'm proposing that trickster support a mix of Etag and cache headers from the downstreams -- such that the downstream can determine (1) what the cache TTL should be and (2) ETag to know when it has changed.

For example, in this world promxy could return an etag which is a sha of the current configuration/state of promxy (downstreams, availability, etc.) combined with the query. This way when the TTL expires trickster can send a request with the If-None-Match field which gives the downstream the opportunity to either (1) return 304 -- meaning the cache entry is still good or (2) return a fresh response.

To be clear, this doesn't entirely remove the cache-discrepant issue, it just provides a mechanism for the downstreams to control how stale the caches get.

@jranson
Copy link
Member

jranson commented Jun 10, 2019

Hey Thomas, I have an issue opened for much of this already (#143) but not in nearly as much detail as you've provided here! Thanks!

I'd like to start in 1.0 with a basic/generic reverse proxy object cache that respects all of the HTTP caching specifications, and when we have that fully vetted, we can move on to augmenting it to support evolving time series data cache management. So i think what that means is implementing #143 first, and then circling back to this issue once that part is completed. Does that sound OK to you?

One goal we have in Trickster is to support many different origin types, and not just Prometheus. We may actually launch 1.0 with support for as many as 4 origin types (including Prom, which will always be the gold standard for Trickster). With that in mind, I want to make sure any patterns we design here that are specific adaptations of what is permitted in the HTTP RFC's to support linear time series data, are done in such a way that they can be adopted easily by other solutions (e.g., possible promxy and thanos equivalents for those other origin types).

In the case of promxy, can I propose we work towards a more basic approach that should be easily instrumented by both Trickster and promxy? It works like this: If promxy knows that the data it is serving in a specific request actually has holes in it (because it knows it couldn't get results from one of the configured hosts), provide a Cache-Control: No-Cache header in your response. Then when we instrument the basic HTTP Caching in Trickster, it would serve the data section flagged as no-cache to the end user, but bypass store it into cache. In that way, that particular data section would continue to be requested fresh from promxy until the failed node is back up. Thoughts?

@jacksontj
Copy link
Contributor Author

jacksontj commented Jun 10, 2019

It sounds like we're on the same page. #143 should be to handle all the regular cache-control headers (so we can define TTLs etc). and then the only addition in mine is to support Etag + If-None-Match requests. The mechanisms I'm describing are regular RFC compliant mechanisms for caching that are regular to HTTP -- so hopefully we can re-use code to do it :)

As for using No-Cache in response on failure, I could do so -- but the concern I'd have is the amplification of traffic in failure. For any of these failure modes if we want to clear the cache we'll require some increase in traffic to the downstream (since we'll have to fetch more data). In the case where the TTL goes from 1h -> 5m (lets say) it would be a 12x increase. If we went to a No-Cache header then it would be potentially significantly more (depending on query volume incoming) -- which might be more load than we want to throw at the downstream (especially since it is in some sort of failure). Having the ability to do shorter TTLs + If-None-Match should give the best mix of control since the downstream could implement a "cheap" way to determine if the data has changed (presumably a mechanism that doesn't require actually fullfilling the query).

So, once #143 is implemented downstreams could at least set shorter TTLs for degraded data, the If-None-Match support would allow for significantly shorter TTLs without causing large spikes in load downstream. I did some looking around and something like https://github.com/gregjones/httpcache might be helpful (at least to get an example to mimic); that is an implementation of http caching as an http.Transport -- I don't know if you'll want to use it (since I imagine it doesn't fit into the new design) but it looks (from my 30s of reading there ;) ) that it implements the caching properly.

So it sounds like this issue has now become "support If-None-Match requests" (dependant on #143 )

@jranson jranson added 2.x release Feature will be available in a future version of 2.x but not slated for 2.0 caching affects the caching layer delta-proxy affects the delta proxy+cache enhancement New feature or request labels Jun 11, 2019
@jranson jranson self-assigned this Sep 6, 2019
@jranson
Copy link
Member

jranson commented Jan 17, 2020

@jacksontj check out the sweet new Cache Control abilities in Trickster 1.0 Beta 10! This includes support for ETag <-> If-None-Match and Last-Modified <-> If-Modified-Since revalidations, as well as Expires, Cache-Control, and the like, between Trickster and the Origin (be it Promxy or anything else). We also support these capabilities between the end client/dashboard server and Trickster. I will keep this issue open for now for further discussion, but hopefully this will meet your needs. Thanks again for the detailed request, and for your patience while we got all of the kinks worked out!

@jranson jranson added 1.0 release Feature/Fix will be available in 1.0 Release and removed 2.x release Feature will be available in a future version of 2.x but not slated for 2.0 labels Feb 7, 2020
@jranson
Copy link
Member

jranson commented Feb 10, 2020

1.0 is now GA and we have not received additional activity on this issue since our previous comments. I will close this for now, but welcome it to be reopened if there are any issues with the cache control features of Trickster 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 release Feature/Fix will be available in 1.0 Release caching affects the caching layer delta-proxy affects the delta proxy+cache enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants