-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache secrets in interceptors with Reflector #594
Cache secrets in interceptors with Reflector #594
Conversation
Hi @tragiclifestories. Thanks for your PR. I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
pkg/interceptors/interceptors.go
Outdated
Get(sr triggersv1.SecretRef) ([]byte, error) | ||
} | ||
|
||
type webhookSecretStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address your question, maybe just secretStore
would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, which of course was what it was when I first wrote this struct def out 😅
pkg/interceptors/interceptors.go
Outdated
ns = eventListenerNamespace | ||
// Get returns the secret value for a given SecretRef. | ||
func (ws *WebhookSecretStore) Get(sr triggersv1.SecretRef) ([]byte, error) { | ||
cachedObj, ok, _ := ws.store.GetByKey(getKey(sr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familiar with the cache store, so curious why we're ignoring the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that if you spelunk into the code, error is always nil from this particular store implementation. However, I will probably handle it anyway, since it may turn out that a different store makes more sense in future.
9ae0f53
to
6eacf24
Compare
pkg/sink/sink.go
Outdated
case i.GitLab != nil: | ||
interceptor = gitlab.NewInterceptor(i.GitLab, r.KubeClientSet, r.EventListenerNamespace, log) | ||
interceptor = gitlab.NewInterceptor(i.GitHub, r.WebhookSecretStore, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a copy/paste mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does indeed ...
Definitely would help with scaling both volume of requests, and triggers that use secrets. This caching layer is something we do need to consider if we go for a process-based plugin mechanism. |
@bigkevmcd I'll be picking this up again. There are actually a whole host of uncached requests going on in |
2ce7e18
to
e5f068d
Compare
So, on further investigation, the fix for this problem implemented in #595 did not work. The reason for this seems to be that the the triggers are executed asynchronously, and thus all immediately stampede to get the secret at the very top of the ExecuteTrigger methods, so in practice you get 100% cache misses, 100% of the time. Which was not what was intended, to put it mildly. This version speeds things up very significantly compared to master in my tests - more thorough than last time around ;-). It takes about 3s to process the hideous 500-github-trigger YAML I put in the examples in this PR, as opposed to something like 3 minutes on master - and most of that is now due to another, far less severe performance problem with compiling CEL expressions (will raise an issue when I get a second). |
0155770
to
95c412b
Compare
pkg/interceptors/interceptors.go
Outdated
} | ||
|
||
return make(map[string]interface{}) | ||
lw := cache.NewListWatchFromClient(cs.CoreV1().RESTClient(), "secrets", metav1.NamespaceAll, fields.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using NamespaceAll here is probably not a long term solution, since it requires cluster-wide list secrets in RBAC. I guess we could wrap a map of stores here - one per namespace referenced in calls to Get
...
cmd/eventlistenersink/main.go
Outdated
@@ -80,12 +82,15 @@ func main() { | |||
logger.Fatal(err) | |||
} | |||
|
|||
webhookSecretStore := interceptors.NewWebhookSecretStore(kubeClient, sinkArgs.ElNamespace, 5*time.Second, stopCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interval should probably be configurable here ...
4406021
to
1a33db2
Compare
@tragiclifestories: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
r.URL is nil sometimes, in which case this code will panic. This fix just handles the nil case.
Co-authored-by: Jace Tan <jaceys.tan@gmail.com>
cb3018a
to
3b4cc64
Compare
3b4cc64
to
e35c790
Compare
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
@tekton-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@tragiclifestories: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I hope I'll get some time to update this in the next week or so. Any ideas on how to write tests for this - and why they don't currently pass - very much welcome. FWIW, we've a version of this PR running in production for over a month with no incident. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale Promise I'll get to this soon ... |
/remove-lifecycle stale |
@tragiclifestories: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
hey @tragiclifestories ! do you have any strong feelings about us closing this PR for now? You can absolutely re-open it (or open a new one) when you're ready to get back to it |
At this point the rebase would probably take longer than redoing it from scratch 😅 . I think this PR can be closed. I will try to find a moment to see if the bug still exists - it may well not for all I know ... |
Changes
This is an alternative implementation of #585 using client-go's cache package. We create a reflector for secrets across all namespaces and use it as a caching layer for secrets in both the Github/lab webhook parsers and CEL
compareSecret
calls.I'm putting this up as a draft to get immediate feedback. Still need to test. Also, now that it's also backing CEL functions,
WebhookSecretStore
is probably the wrong name ...Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Release Notes