-
Notifications
You must be signed in to change notification settings - Fork 234
TEP-0161: resolver caching #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have a few question, and a more general on thinking a bit more on how this could/would interact with non built-in resolver ? (and if it should)
- Cache configuration should be flexible and configurable | ||
- Support different caching modes (always, never, auto) | ||
- Cache size and TTL should be configurable | ||
- Cache should be shared across resolvers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we see work out ? I "guess" the "key" for a cache would be, as said below I think, the digest of an image, or the revision. This makes sharing across resolver slightly impossible ? Or does it mean "it should be available for all and share in terms of storage" and not in terms of "cache hits" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing it here:
https://github.com/brianwcook/pipeline/blob/7df3f089b9da1b9c24ab0b78faf69a09e71f2ba4/pkg/remoteresolution/cache/cache.go#L144
By creating a string representation of the params struct and then hashing it. The one thing left to do is to check for variables that should removed - cache
is one of them, because, for example, changing value of cache from auto to always should not affect the cache key.
Generating the key this way though gives us a simple way to generate cache keys for non-immutable references.
This makes sharing across resolver slightly impossible
Yes, this was not a goal. I'm not sure that makes sense. I'd have to think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I don't think we want to share cache across resolver as a goal anyway.
```yaml | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: resolver-cache-config | ||
namespace: tekton-pipelines-resolvers | ||
data: | ||
max-size: "1000" # Maximum number of cache entries | ||
default-ttl: "5m" # Default time-to-live for cache entries | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we see value in having different ttl depending on the resolver ? (e.g. 5m for the bundle resolver, but 30m for the git one, …)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't see value in it and I believe to do that we want have to instantiate a cache per resolver. Even a cache with a short ttl like 5 minutes should deliver relief from the kind of problems that I see:
- Where a user triggers some build automation that starts a large number of pipelines (over 100)
- all pipelines are trying to resolve around the same period of time
- all pipelines are very similar, so resolving just a few of them would populate the cache for the rest of them.
Three caching modes will be supported: | ||
|
||
1. `always`: Always use cache regardless of resource type | ||
2. `never`: Never use cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would disable the "automatically enabled cache" right ? Essentially the default value would be auto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, intend for the default to be 'auto' as it only caches immutable references and so it shouldn't surprise anyone or break any existing usage.
|
||
Three caching modes will be supported: | ||
|
||
1. `always`: Always use cache regardless of resource type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what in means in practice ? For example, how this would act with the hub resolver, or even a non built-in resolver ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means whatever was resolved, that yaml will live until ttl expires or it gets knocked out of the cache (LRO), for both immutable and non-immutable references. This is why I set a relatively short default ttl (5m).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other things that maybe are not as obvious as I thought:
- caching is set to auto by default for resolvers where caching is enabled (currently in my draft PR, bundle and git resolver)
- other resolvers work just like they did before
- the user adds "cache: always/never" to their task parameters to override the default.
- once this feature is added, task authors can start accepting a cache variable from the pipelineRun and default to auto, so that a user could set "cache:always/never" on their pipelineRun to control caching behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
@waveywaves @afrittoli @AlanGreene @twoGiants can I get some reviews ?
- Cache configuration should be flexible and configurable | ||
- Support different caching modes (always, never, auto) | ||
- Cache size and TTL should be configurable | ||
- Cache should be shared across resolvers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I don't think we want to share cache across resolver as a goal anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEP looks great. Not much to add. Just added my thought to the open questions.
3. No caching | ||
- Rejected: Would not address performance and reliability concerns | ||
|
||
## Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is it necessary to add a cache statistics API?
Could be useful for tuning the cache. Otherwise it would be "just playing" around with the configuration. Do you have use for it in your scenario?
- There is no way to fetch resources from http servers with an immutable reference. HTTP servers rarely rate limit in the same way as registries and git forges. Is Caching for HTTP resolver necessary?
I agree, there is rarely a rate limit. Maybe its useful for company internal shared resources to reduce internal traffic. But I tend to => not necessary.
- Same quesation for Hub resolver?
- And, same question for Cluster resolver? Caching seems unnecesasry here.
Agree, don't think its needed here.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: twoGiants, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@vdemeester a small API could be provided for custom resolver which want caching. Not sure its really necessary though. |
Yeah, I think it's not necessary now at least, we can go ahead with the proposal and see later if this is something that we need. |
Signed-off-by: Brian Cook <bcook@redhat.com>
9322300
to
a186eeb
Compare
I need lgtm label please 🙏 |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data-monster-6aae0
This is a draft TEP for the work I am doing in tektoncd/pipeline#8825 to add caching to the resolvers, primarily git and bundle resolvers.