Skip to content

TEP-0161: resolver caching #1207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2025

Conversation

brianwcook
Copy link
Contributor

This is a draft TEP for the work I am doing in tektoncd/pipeline#8825 to add caching to the resolvers, primarily git and bundle resolvers.

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 16, 2025
@vdemeester vdemeester self-assigned this Jun 16, 2025
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have a few question, and a more general on thinking a bit more on how this could/would interact with non built-in resolver ? (and if it should)

- Cache configuration should be flexible and configurable
- Support different caching modes (always, never, auto)
- Cache size and TTL should be configurable
- Cache should be shared across resolvers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we see work out ? I "guess" the "key" for a cache would be, as said below I think, the digest of an image, or the revision. This makes sharing across resolver slightly impossible ? Or does it mean "it should be available for all and share in terms of storage" and not in terms of "cache hits" ?

Copy link
Contributor Author

@brianwcook brianwcook Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing it here:
https://github.com/brianwcook/pipeline/blob/7df3f089b9da1b9c24ab0b78faf69a09e71f2ba4/pkg/remoteresolution/cache/cache.go#L144

By creating a string representation of the params struct and then hashing it. The one thing left to do is to check for variables that should removed - cache is one of them, because, for example, changing value of cache from auto to always should not affect the cache key.

Generating the key this way though gives us a simple way to generate cache keys for non-immutable references.

This makes sharing across resolver slightly impossible

Yes, this was not a goal. I'm not sure that makes sense. I'd have to think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I don't think we want to share cache across resolver as a goal anyway.

Comment on lines 51 to 60
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: resolver-cache-config
namespace: tekton-pipelines-resolvers
data:
max-size: "1000" # Maximum number of cache entries
default-ttl: "5m" # Default time-to-live for cache entries
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see value in having different ttl depending on the resolver ? (e.g. 5m for the bundle resolver, but 30m for the git one, …)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't see value in it and I believe to do that we want have to instantiate a cache per resolver. Even a cache with a short ttl like 5 minutes should deliver relief from the kind of problems that I see:

  • Where a user triggers some build automation that starts a large number of pipelines (over 100)
  • all pipelines are trying to resolve around the same period of time
  • all pipelines are very similar, so resolving just a few of them would populate the cache for the rest of them.

Three caching modes will be supported:

1. `always`: Always use cache regardless of resource type
2. `never`: Never use cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would disable the "automatically enabled cache" right ? Essentially the default value would be auto ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, intend for the default to be 'auto' as it only caches immutable references and so it shouldn't surprise anyone or break any existing usage.


Three caching modes will be supported:

1. `always`: Always use cache regardless of resource type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what in means in practice ? For example, how this would act with the hub resolver, or even a non built-in resolver ?

Copy link
Contributor Author

@brianwcook brianwcook Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means whatever was resolved, that yaml will live until ttl expires or it gets knocked out of the cache (LRO), for both immutable and non-immutable references. This is why I set a relatively short default ttl (5m).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other things that maybe are not as obvious as I thought:

  • caching is set to auto by default for resolvers where caching is enabled (currently in my draft PR, bundle and git resolver)
  • other resolvers work just like they did before
  • the user adds "cache: always/never" to their task parameters to override the default.
  • once this feature is added, task authors can start accepting a cache variable from the pipelineRun and default to auto, so that a user could set "cache:always/never" on their pipelineRun to control caching behavior.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.
@waveywaves @afrittoli @AlanGreene @twoGiants can I get some reviews ?

- Cache configuration should be flexible and configurable
- Support different caching modes (always, never, auto)
- Cache size and TTL should be configurable
- Cache should be shared across resolvers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I don't think we want to share cache across resolver as a goal anyway.

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2025
Copy link
Member

@twoGiants twoGiants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEP looks great. Not much to add. Just added my thought to the open questions.

3. No caching
- Rejected: Would not address performance and reliability concerns

## Open Questions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is it necessary to add a cache statistics API?

Could be useful for tuning the cache. Otherwise it would be "just playing" around with the configuration. Do you have use for it in your scenario?

  1. There is no way to fetch resources from http servers with an immutable reference. HTTP servers rarely rate limit in the same way as registries and git forges. Is Caching for HTTP resolver necessary?

I agree, there is rarely a rate limit. Maybe its useful for company internal shared resources to reduce internal traffic. But I tend to => not necessary.

  1. Same quesation for Hub resolver?
  2. And, same question for Cluster resolver? Caching seems unnecesasry here.

Agree, don't think its needed here.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: twoGiants, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@twoGiants
Copy link
Member

twoGiants commented Jul 1, 2025

... how this could/would interact with non built-in resolver ? (and if it should)

@vdemeester a small API could be provided for custom resolver which want caching. Not sure its really necessary though.

@vdemeester
Copy link
Member

... how this could/would interact with non built-in resolver ? (and if it should)

@vdemeester an small API could be provided for custom resolver which want caching. Not sure its really necessary though.

Yeah, I think it's not necessary now at least, we can go ahead with the proposal and see later if this is something that we need.

Signed-off-by: Brian Cook <bcook@redhat.com>
@brianwcook brianwcook force-pushed the TEP-resolver-caching branch from 9322300 to a186eeb Compare July 7, 2025 16:11
@brianwcook brianwcook changed the title Draft TEP: resolver caching TEP-0161: resolver caching Jul 7, 2025
@brianwcook
Copy link
Contributor Author

I need lgtm label please 🙏

@afrittoli
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 8, 2025
@tekton-robot tekton-robot merged commit 6c1149e into tektoncd:main Jul 8, 2025
3 checks passed
Copy link

@Antonio1t2 Antonio1t2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data-monster-6aae0

@brianwcook brianwcook deleted the TEP-resolver-caching branch July 8, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants