Controlling max parallel jobs per pipeline #2591

ysaakpr · 2020-05-09T18:05:56Z

What is the way to control concurrency? The pipeline has 100 independent steps. But I don't want them to run all 100 together. For different pipeline run, I wish to adjust the concurrency as well.

imjasonh · 2020-05-09T23:50:20Z

There isn't a configuration for this today, but it should be possible if there's demand and the use cases make sense.

In the meantime you can run pipelines in a namespace with a resource limit such that no more than X CPUs are available to tasks, and those over the cap will queue until others finish. If you're just trying to limit the resource footprint this is likely the best way to express the limitation.

Can you give more details about why you want to limit concurrency of tasks?

https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/

ysaakpr · 2020-05-10T07:45:01Z

Using Kubernetes resource-wise limits is one of the ways. But which pretty hard to achieve due to the kind of resource limits that I have to dynamically do, based on different aspects.
In my pipeline, we are loading 100 of DB dump to a new database, for creating a new environment from a seed database instance. All the jobs can run parallel, But if we run that the DB instance. from which we taking data will be chocked out and cause connection limits error.

Controlling concurrency is well needed for a CI-CD system, where though all can be run in parallel, the user should get an option to control due resource limitation/availability.

Note: Currently I have achieved it using our consul server, and using consul-cli to wrap the run script on a shared key and concurrency. eg Snippet below.

consul lock -n ${concurrency} -child-exit-code ${jobkey} "bash $script $@"

There are few problems in this case:

Pods are already started running, so total run time to completion will be wait time + runtime of the script
Not easy to configure using pipeline run args, need to modify my script to achieve concurrency control

But the same can be directly added as a feature to tekton to use a semaphore and lock max concurrent jobs, on pipelinerun and create pods only after the locks are aquiired.

ysaakpr · 2020-05-10T14:02:57Z

@imjasonh I could contribute to this if someone can give me some hints on the code structure as well as the standards to follow, plus any other technical issues if exists, which will conflict against this behaviour

imjasonh · 2020-05-10T14:06:36Z

Thanks, I think this seems like a reasonable addition, and would be happy to help you with it.

First, what kind of API addition are you envisioning? What's the minimum addition we can add that we could extend later if we need to? Is there any precedent in other workflow tool we could borrow/steal from?

Depending on the size of the change, we'd probably ask for a design doc answering those questions, and describing the use case (which you've done above, thanks for that!)

ysaakpr · 2020-05-10T17:43:20Z

At first glance, I think a property in pipelinerun or|and pipeline resource named concurrency
concurrency if defined should be a value gte 0. Zero, can make the pipeline to pause between run, any positive value will set the max parallel that can be triggered. If no value set, all possible will run in parallel.

Initial version for this feature can be just concurrency field in the pipeline run. I am not sure exactly similar feature that we can borrow, But GitLab has runner concurrency that a user can set per runner.

jlpettersson · 2020-05-10T19:34:57Z

One way of handling this is by using a Pod quota for the Namespace.

ysaakpr · 2020-05-11T03:54:37Z

One way of handling this is by using a Pod quota for the Namespace.

Of course, you have an option from Kubernetes resource limits as you mentioned, but not always practical. For example, In my namespace, I am not just using the only tekton. And configuration and usability point of view, using pod quota is much more complex than using a value of max parallel in pipelinerun

imjasonh · 2020-05-11T15:32:46Z

@ysaakpr Configuring pipeline-wide concurrency limits definitely seems easiest, but I wonder if that's expressive/powerful enough to satisfy more complex use cases. We should explore other options, even if only to be able to dismiss them with well thought out reasons.

Consider a pipeline that does two highly parallelizeable things (e.g., initializing a database, then later deploying code to multiple AZs), but each of those parallelizeable things have different concurrency caps -- it might make sense to kick off 1000 init-db tasks at once, max 100 at a time, then later in that same pipeline kick off 10 deploy tasks, max 3 at a time. Configuring concurrency: 100 at the pipeline level wouldn't help limit the second group of tasks. A user could manually configure their pipeline to perform 3 deploy tasks in parallel, then the next 3, etc., but that's exactly the kind of manual configuration we're trying to avoid -- they also could have manually configured the pipeline to do 100 init-db tasks in parallel, then the next 100, etc., today, but that's toilsome.

(To be clear, this example isn't reason enough by itself to discount the pipeline-wide concurrency config, but it's worth considering and at least explicitly acknowledging this shortcoming.)

One way to express the different concurrency levels would be to group tasks together, then express concurrency limits per-group. Is that worth the additional config required to express/understand/visualize this grouping? I'm not sure. Would it be possible to support group-wise limits and pipeline-wide limits side-by-side? I truly have nothing to offer but open-ended questions. :)

ysaakpr · 2020-05-11T16:48:22Z

@imjasonh that's a good thought. There are already two other tickets for Task grouping in a pipeline, which actually discussing the pipeline task grouping
#2592 and #2586 (comment).

As you mentioned, the idea of concurrency should not be limited to just at the pipeline level. I agree/accept that for a more complex pipeline, configuring this at group of task-level would be an always amazing feature.

Pipeline level concurrency will be the max possible/default concurrency. And task group level concurrency can be used to fine-tune it further.

dibyom · 2020-05-11T18:04:57Z

/kind feature
/area api

vdemeester · 2020-05-18T15:29:23Z

/priority important-longterm

ysaakpr · 2020-05-20T06:59:00Z

How could I contribute on this, Are there any discussion forum? Where I can also be part of the design/implementation discussions.

takirala · 2020-06-09T21:59:08Z

+1 for this feature.
I am also looking for something similar and open to contributing to any discussions/design/code.

holly-cummins · 2020-06-22T20:21:36Z

See also #2828.

jlpettersson · 2020-06-22T20:32:33Z

I think it would not be so difficult to add logic for this.

e.g. right before we create a TaskRun - we could check if we have less than X uncompleted TaskRuns or else omit creating a new.

later when a TaskRun is completed, the PipelineRun will do the reconciliation again, and the creation of a TaskRun will be re-evaluated.

tekton-robot · 2020-08-14T23:46:15Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T23:46:15Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T23:46:17Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vdemeester · 2020-08-17T09:12:31Z

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

tekton-robot · 2020-08-17T09:12:33Z

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

afrittoli · 2021-03-09T17:45:43Z

/remove-lifecycle rotten

afrittoli · 2021-04-06T16:42:22Z

Controlling max taskruns: #3796

afrittoli · 2021-04-06T16:44:03Z

Related issue in experimental: tektoncd/experimental#699

afrittoli · 2021-04-06T16:46:15Z

Related approval task issue: tektoncd/experimental#728

tekton-robot · 2021-07-05T17:33:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-10-15T00:09:34Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-11-14T01:03:31Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-11-14T01:03:32Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

afrittoli · 2022-02-02T14:00:48Z

/remove-lifecycle rotten

afrittoli · 2022-02-02T14:00:57Z

/lifecycle frozen

amitjha780 · 2023-04-24T09:28:13Z

Hi all, Is there any stable solution available in Tekton to attain Concurrent build for pipelines?
I have gone through many issues/ideas raised with the community relevant to this but not get any exact solutions for the same.
#6112
#1305
#2134
If anyone have any views or any latest update on this open issues added on Tekton Community Roadmap- #2591 Pls update here accordingly.

eudescosta · 2023-06-15T10:15:42Z

A google search for controlling the parallel jobs per pipeline brought me here :)
I +1 that this would be extremely handy, looking forward to see this on the roadmap!

sibelius · 2024-04-10T11:51:29Z

what is the best approach for now ?

eudescosta · 2024-04-10T11:57:58Z

what is the best approach for now ?

I am checking on a shell script ... something on these lines

check_job() {
  pipeline_run_name=$(kubectl get pipelinerun --sort-by=.status.startTime | grep -E '<pipenline_name>' | awk 'END{ print $1 }')
  kubectl get pipelinerun $pipeline_run -o jsonpath='{.status.conditions[?(@.reason == "Running")].reason}'
}

caiocampoos · 2024-04-10T13:26:57Z

1 for this, it would be super useful.

Another similar feature would be cancel pipelines based on other types of pipelines running.

imjasonh changed the title ~~Controlling max parrallel jobs per pipeline~~ Controlling max parallel jobs per pipeline May 11, 2020

tekton-robot added kind/feature Categorizes issue or PR as related to a new feature. area/api Indicates an issue or PR that deals with the API. labels May 11, 2020

dibyom added the kind/design Categorizes issue or PR as related to design. label May 11, 2020

tekton-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label May 18, 2020

afrittoli added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 15, 2020

tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 14, 2020

tekton-robot closed this as completed Aug 14, 2020

tekton-robot reopened this Aug 17, 2020

tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 23, 2021

dibyom removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 9, 2021

tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 9, 2021

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 15, 2021

This was referenced Oct 15, 2021

[task-loops] Remove max parallel TaskRuns in Task Loops tektoncd/experimental#804

Closed

TEP-0090: Matrix [Problem Statement] tektoncd/community#532

Merged

tekton-robot closed this as completed Nov 14, 2021

dibyom mentioned this issue Jan 20, 2022

[Workflows] Explore Pipeline Concurrency Support in Workflows tektoncd/experimental#826

Open

afrittoli reopened this Feb 2, 2022

tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 2, 2022

tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 2, 2022

jerop added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Feb 17, 2022

Controlling max parallel jobs per pipeline #2591

Controlling max parallel jobs per pipeline #2591

Comments

ysaakpr commented May 9, 2020

imjasonh commented May 9, 2020 • edited Loading

ysaakpr commented May 10, 2020 • edited Loading

ysaakpr commented May 10, 2020

imjasonh commented May 10, 2020

ysaakpr commented May 10, 2020

jlpettersson commented May 10, 2020 • edited Loading

ysaakpr commented May 11, 2020

imjasonh commented May 11, 2020

ysaakpr commented May 11, 2020

dibyom commented May 11, 2020

vdemeester commented May 18, 2020

ysaakpr commented May 20, 2020

takirala commented Jun 9, 2020

holly-cummins commented Jun 22, 2020

jlpettersson commented Jun 22, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

vdemeester commented Aug 17, 2020

tekton-robot commented Aug 17, 2020

afrittoli commented Mar 9, 2021

afrittoli commented Apr 6, 2021

afrittoli commented Apr 6, 2021

afrittoli commented Apr 6, 2021

tekton-robot commented Jul 5, 2021

tekton-robot commented Oct 15, 2021

tekton-robot commented Nov 14, 2021

tekton-robot commented Nov 14, 2021

afrittoli commented Feb 2, 2022

afrittoli commented Feb 2, 2022

amitjha780 commented Apr 24, 2023

eudescosta commented Jun 15, 2023

sibelius commented Apr 10, 2024

eudescosta commented Apr 10, 2024

caiocampoos commented Apr 10, 2024

imjasonh commented May 9, 2020 •

edited

Loading

ysaakpr commented May 10, 2020 •

edited

Loading

jlpettersson commented May 10, 2020 •

edited

Loading