-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache trigger secrets for the duration of request #585
Conversation
This commit adds a request-local cache for interceptors to leverage during the processing of triggers. It allows interceptors to avoid doing expensive work more than once for each request, such as fetching a Kubernetes secret for validating webhooks. The implementation uses the request context to provide the cache. This was the least disruptive method of providing a cache for use with interceptors, and is appropriate if you consider the cache should live only for the duration of each request. Alternative implementations might have used the client-go informers to extend the Kubernetes client to watch for secrets in the cluster. This would cause the work required to fetch secrets to scale with the number of secrets in the cluster, as opposed to making a fresh request per webhook we process. That said, building caching clients seems like more work than is necessary for fixing this simple problem, which is why I went with a simple cache object. The background for this change was finding Github webhooks timing out once we exceeded ~40 triggers on our EventListener. While the CEL filtering was super fast, the validation of Github webhook signatures was being computed for every trigger, even though each trigger used the same Github secret. Pulling the secret from Kubernetes was taking about 250ms, which meant 40 triggers exceeded the 10s Github timeout.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @lawrencejones. Thanks for your PR. I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hey team! I'm totally new to the tekton codebase, so I've opened this PR as a draft and left tests out for now. Ideally you'd let me know if this approach looks viable to you, and if so, I'll add the tests and polish it. This addresses a comment I made the other day here: #406 (comment) Thanks! |
Thank you for the PR! Sorry I missed your comment on the issue (we were on holiday for the last few days). I think I have a slight preference for using a caching client -- we should be able to use the knative Store to make thing easier (like tektoncd/pipeline#2637). That being said, this PR is definitely a step in the right direction and if the other approach ends up being too complex, we can merge this and then iterate! |
That sounds fine to me. Didn't want to introduce the cached client unless it was used elsewhere. I'll have a look at your example and give that a shot :) |
Hi @dibyom - I might pick this up from Lawrence as my team's made time for investigating perf issues like this. It doesn't look like the knative library provides an equivalent to the |
Hey @tragiclifestories that sounds good to me! Thanks for working on it! |
Hi there @dibyom, thanks for the reply. Having dug into the k8s client-go docs a little more, I've come to realise that implementing it that way is going to be a much bigger time sink than I had hoped. I have a branch on my fork that essentially finishes this PR off in its current implementation with tests and so on. Though I plan to take another stab at the informer version this afternoon, it would be great if the request cache version was acceptable as a stop-gap, since there's a good chance it makes a major difference to the performance of event listeners with large numbers of Git(hub|lab) triggers as it currently is. Sorry for blowing hot and cold on this, still getting my head around all the various moving parts in k8s client land! |
Sure, this is definitely an improvement over what we have at the moment and since its not an external facing API change, we should be able to switch to a another implementation later. Happy to review a PR with the request cache changes! |
I'm going to close this in favor of the other two PRs mentioned! |
@dibyom: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Changes
This commit adds a request-local cache for interceptors to leverage
during the processing of triggers. It allows interceptors to avoid doing
expensive work more than once for each request, such as fetching a
Kubernetes secret for validating webhooks.
The implementation uses the request context to provide the cache. This
was the least disruptive method of providing a cache for use with
interceptors, and is appropriate if you consider the cache should live
only for the duration of each request.
Alternative implementations might have used the client-go informers to
extend the Kubernetes client to watch for secrets in the cluster. This
would cause the work required to fetch secrets to scale with the number
of secrets in the cluster, as opposed to making a fresh request per
webhook we process. That said, building caching clients seems like more
work than is necessary for fixing this simple problem, which is why I
went with a simple cache object.
The background for this change was finding Github webhooks timing out
once we exceeded ~40 triggers on our EventListener. While the CEL
filtering was super fast, the validation of Github webhook signatures
was being computed for every trigger, even though each trigger used the
same Github secret. Pulling the secret from Kubernetes was taking about
250ms, which meant 40 triggers exceeded the 10s Github timeout.
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Release Notes