Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Pipeline Mutexes #2828

Open
wlynch opened this issue Jun 17, 2020 · 29 comments
Open

Idea: Pipeline Mutexes #2828

wlynch opened this issue Jun 17, 2020 · 29 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@wlynch
Copy link
Member

wlynch commented Jun 17, 2020

This was an idea that @k floated to me awhile back, but I finally got around to making an issue to discuss. What I'm curious about:

  • Is this a use case we want to focus on?
  • Is this worth make this a built-in feature? (as opposed to a Catalog feature)
  • Any other features / alternatives to consider

Details intentionally vague - this is a "should we do this?" issue, not a "how we'll do this" issue.

Idea

I may want to control how Pipelines run in relation to others and ensure only 1 pipeline for a given selector can run at a time (hence a "mutex").
I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel. This might be because:

  • I have a presubmit pipeline that I want to only run one instance at a time per pull request to reduce costs (e.g. in case someone pushes multiple commits. I only need to run the most recent and can cancel the currently running pipelines).
  • My pipeline mutates some external state, and I want to make sure only one thing operates on it at a time.

Possible solution

Have a mechanism to select conditions to allow Pipeline execution, as well as a strategy for what to do in response.

Examples

If a new Pipeline is created that was labelled as a pull request, cancel existing runs.

selector: repo=foo, type=pullrequest
strategy: cancel

Only run 1 pipeline at a time that was labelled as being started by a push to master. (does not guarantee ordering)

selector: repo=foo, type=push, ref=master
strategy: queue

Deny new pipeline create requests if they match a pipeline currently running.

selector: repo=foo, type=push, ref=master
strategy: deny

Alternatives

Implement as a task

  • Cancellation could be handled by having the first step of every pipeline could include something along the lines of

    kubectl delete pipelinerun -l foo=bar
    

    This would clobber over any other Pipelines with a particular label.

  • Queueing could be handled by having a Condition that runs kubectl get for running pods, and only proceed if a condition is true. This is difficult since you'd have to get creative in inspecting runtime information of other runs (e.g. are they also in a wait state, or are they running). This also creates container waste since the pipelines would all be running.

  • Deny could not be implemented this way.

@vdemeester
Copy link
Member

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 18, 2020
@tragiclifestories
Copy link

We've built a queueing system to manage our way around this problem, so a +1 for it being a useful thing to tackle. I don't know whether it should be a core primitive or in the catalog, but in our use case is was necessary at a very early stage, and for deployments specifically it seems to me that having one deployment per app per environment at a time, and ideally in a sensible order, is going to be a very common requirement. So I'm leaning towards 'core'.

@holly-cummins
Copy link

I'd really appreciate this as well, perhaps with Task granularity rather than pipeline. My use case for this, which is to do with cross-talk between concurrent runs of integration tests/db management. In an integration test scenario, for example, the tasks depend on an external resource. If that resource is stateful (like a database), some tasks are rebuilding the database while others might be executing tests which use the database. I'd love to be able to single-thread pipeline runs through the integration test phase.

@resamaraschi
Copy link

I also think this is a very common use case for a CI/CD Pipeline.
Our scenario is that we have one test cluster for all the created PRs. For instance when 2 developers opens 2 PRs, the pipeline should test one PR against the test cluster first and set the status for corresponding PR. Meanwhile all other pipelineRuns for other PRs should be queued until the cluster is free for the next run.

@holly-cummins
Copy link

I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297

@tragiclifestories
Copy link

Interesting!

We took a different approach by storing the queue data in configmaps and defining all the queue operations as scripts that run in task steps. So no explicit modelling through CRDs but it works well enough for our use case.

Hopefully we'll get around to the blog-post stage of the project soon.

@afrittoli
Copy link
Member

I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297

Nice :)
@pritidesai finally in action

@PoliM
Copy link

PoliM commented Aug 11, 2020

Here is the use case we currently have. Imagine this simplified CD pipeline:
--> DeployToDev --> TestOnDev --> DeployToPreProd --> TestOnPreProd --> DeployToProd --> SmoketestOnProd
There are something like three sections Dev, PreProd and Prod.
The pipeline is started for a commit in a Git repository that holds the configuration of an application (GitOps). Now here are some requirements:

  • if a PipelineRun is in TestOnPreProd we don't want another PipelineRun to start DeployToPreProd because the test should be the result of the configuration that the first PipelineRun was started with.
  • but it is ok to have another PipelineRun starting to DeployToDev.
  • a PipelineRun must not take over another PipelineRun because the CD pipeline should deliver the stuff in the order as the configurations were committed to the Git repository.
  • to make things worse - we use the same pipeline to deploy different applications. So the exclusivity should be per section and application.

Currently we use a task in front of the section that polls a REST service to "ask to enter the section". The implementation of the REST service is specific to our pipeline and uses the Tekton API to analyse the state of all the PipelineRuns. It's ugly 😊 but it works so far.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2020
@Letty5411
Copy link
Contributor

I'd prefer both task and pipeline granularity mutex.

@Letty5411
Copy link
Contributor

Hi @pritidesai , is there any update about this issue? Thanks :)

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 10, 2020
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@julweber
Copy link

+1

@julweber
Copy link

Dear tekton Team,

is there any update about this?

It would be great if there was an option to set pipeline runs to a serial mode.
In the scenario that a new pipeline run is added while a pipeline run for the same pipeline within the same namespace is already running, the new run could wait for the existing one to finish before being executed.

For example:
Concourse CI allows this via a serial job - https://concourse-ci.org/serial-job-example.html

Kind regards

@jerop
Copy link
Member

jerop commented Apr 21, 2021

is there any update about this?

@julweber this is being explored in tekton experimental repo: tektoncd/experimental#699 - @imjasonh shared an idea in that issue

@julweber
Copy link

Hey @jerop ,

sorry for the late reply. Thanks a lot for the link, i will have a look.

Cheers,
Julian

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@afrittoli
Copy link
Member

/reopen

@tekton-robot
Copy link
Collaborator

@afrittoli: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Oct 22, 2021
@afrittoli
Copy link
Member

/remove-lifecycle rotten

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 22, 2021
@bobcatfish
Copy link
Collaborator

Feels like this could be part of a possible solution to the discussion we've been having over at: tektoncd/plumbing#888 (comment)

If we want to take this forward I think what will really help is fleshing out the use cases that this feature would solve; @afrittoli this might not be the quite behavior you'd want for some of our common dogfooding use cases (tho it would be better than having a race!):

I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel.

For PR triggered PipelineRuns/TaskRuns I think what you often want is to run the newest one and cancel the others (e.g. imagining a PR being updated after kicking off PipelineRuns/TaskRuns)

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2022
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 22, 2022
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lbernick
Copy link
Member

lbernick commented Aug 5, 2022

/lifecycle frozen

@lbernick lbernick reopened this Aug 5, 2022
@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 5, 2022
@vdemeester
Copy link
Member

Related to tektoncd/community#716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Development

No branches or pull requests