-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[concurrency] Concurrency controller fails to cancel matching PipelineRuns #932
Comments
/assign |
I think I know what's going on: The concurrency control relies on a label, "tekton.dev/pipeline:teps-linter", which is added by the PipelineRun controller rather than hard-coded in the trigger template. When I switch the concurrency control to rely on a label ("tekton.dev/check-name: pull-community-teps-lint") that is set by the trigger template rather than the PipelineRun controller, the previous PipelineRun is successfully canceled. I think the first reconcile loop of the concurrency controller is happening before the PipelineRun controller has set this label, so the concurrency control never matches the PipelineRun. I'm still not sure why this seems to happen every time, rather than failing only some of the time (as I'd expect if there were a race condition between the PipelineRun controller and the concurrency controller). One possibility is that the event w/ message "the object has been modified; please apply your changes to the latest version and try again" is coming from the update call in the PipelineRun controller that sets these labels, which fails because the concurrency webhook has already patched the labels when the PipelineRun was created. (Not sure if this is how reconciler queues and patches/updates actually work?) I'm also not sure what to do about this-- one option would be to skip a concurrency controller reconcile loop if the label applied by the PipelineRun controller is not present yet. (A bit hacky and we'd have to build these assumptions into the unit tests.) What if a project has another controller that sets labels on PipelineRuns somehow and wants to use them with concurrency controls? |
Update: the reason why this is deterministic is that the pipeline is not fetched until after the PipelineRun starts executing; i.e. the concurrency controller marks the PipelineRun as no longer pending, the PipelineRun begins executing, and then the PipelineRun controller applies the label "tekton.dev/pipeline". Therefore, I'm not sure how we can simultaneously prevent a PipelineRun from starting until concurrency controls are applied, and also allow concurrency controls to be applied w/ the label "tekton.dev/pipeline" (unless we change the pipelinerun controller behavior). |
I think the pipelinerun controller behavior should be modified, and a field similar to |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Expected Behavior
When a PipelineRun is created, the concurrency webhook marks it as pending w/ label "tekton.dev/ok-to-start=true". When the PipelineRun is reconciled for the first time, the concurrency controller cancels all PipelineRuns matching relevant concurrency controls (based on their labels), removes the label "tekton.dev/ok-to-start=true", and adds the label "tekton.dev/concurrency=true". The controller ignores PipelineRuns labeled "tekton.dev/concurrency=true", so the PipelineRun is not modified on subsequent reconcile loops.
Actual Behavior
Controller does not cancel other PipelineRuns when the PipelineRun is first reconciled. I think the PipelineRun doesn't have labels set when it's first reconciled, meaning that no other PipelineRuns are canceled, it's marked with "tekton.dev/concurrency=true", and only gets the relevant labels... later? This means it never matches the concurrency control.
Steps to Reproduce the Problem
logging-config
configmap intekton-concurrency
namespace.Evidence
From the labels, we can see that each PipelineRun initially appears as "PipelineRunPending", with generation 2 and the label "tekton.dev/concurrency:true". Neither of them are ever canceled. We can see in the
kubectl describe
output for each PipelineRun some events that indicate patch calls may have failed due to concurrent writes (maybe?):We can also see in the controller logs that this line of code was reached for both PRs, indicating that they were reconciled once by the controller before the "tekton.dev/concurrency:true" label was applied, but we don't see this line being reached, indicating that the concurrency control never matched the PipelineRun, despite both PipelineRuns always having the label "tekton.dev/pipeline:teps-linter" in kubectl output.
Additional Info
Concurrency project version hash: https://github.com/tektoncd/experimental/tree/7bce447ed20956ef28a22ccdbf28f1b60ff1b0f7/concurrency
Kubernetes version:
Output of
kubectl version
:The text was updated successfully, but these errors were encountered: