-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - failure strategies using runOn 🎉 #2094
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following is the coverage report on pkg/.
|
|
||
// RunOn declares a list of dependent tasks and their states that they have to be in to run this task. | ||
// +optional | ||
RunOn []PipelineTaskRunOn `json:"runOn,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new field runOn
as part of PipelineTask
Task string `json:"task"` | ||
// States is a list of states of a PipelineTask which is specified in the Task above | ||
States []PipelineTaskState `json:"states"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runOn
is a map with key as a task name and value as a list of states
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a good time to bikeshed on the runOn
format i.e. map vs list. And I think @skaegi has some ideas around here as well :)
PipelineTaskStateSuccess PipelineTaskState = "success" | ||
// PipelineTaskStateFailure indicates that a PipelineTask has finished execution and has failed | ||
PipelineTaskStateFailure PipelineTaskState = "failure" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding two possible states to begin with: (1) success (2) failure, remaining two states (skip and always) will go here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think other states we'd need are 1. skip for condition failure 2. skip due to a previously failing taskrun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
humm, skip
for sure, and may be always
?
skip
would run when parent tasks skipping due to condition failure.
Not sure what (2) signify? By default, runOn
task is skipped if the states conflicts and that task is marked as conflicted (RunOnStateConflicted
), any task dependent on task with RunOnStateConflicted
is also skipped and marked RunOnStateConflicted
instead of failing the whole pipeline, please check pipeline.md for more details.
// Validate PipelineTask runOn and runAfter | ||
if err := validatePipelineTaskRunOn(ps.Tasks); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validating runOn
for not having any state other than whats allowed and making sure doesnt have duplicates
// GetSchedulable parses the whole graph and get a list of candidates based on already executed tasks | ||
// e.g. for this Pipeline task1 -> task2 (runAfter task1) -> task3 (runAfter task2) | ||
// GetSchedulable returns task1 followed by task2 (after task1 finishes) followed by task3 (after task2 finishes) | ||
candidateTasks, err := dag.GetSchedulable(d, pipelineState.ExecutedPipelineTaskNames()...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was little tricky to change, SuccessfulPipelineTaskNames
returned list of tasks which were finished executing successfully. Now, with runOn
failed tasks should also be included in the returned list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps pipelineState.CompletedPipelineTaskNames()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dibyom for suggestion, replaced with CompletedPipelineTaskNames
@@ -150,17 +169,67 @@ func (state PipelineRunState) IsDone() (isDone bool) { | |||
// GetNextTasks will return the next ResolvedPipelineRunTasks to execute, which are the ones in the | |||
// list of candidateTasks which aren't yet indicated in state to be running. | |||
func (state PipelineRunState) GetNextTasks(candidateTasks map[string]struct{}) []*ResolvedPipelineRunTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where the most of the changes happened, for a task before adding it to the queue, check if runAfter
is specified, check if runOn
is specified, match the state in case runOn
is specified. Mark the task as conflicted when runOn
states does not match with the actual state, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented this already, but instead of marking the TaskRun as conflicted, could we simply not return it here? Is the "conflicted" status used to avoid having to match the runOn filters in various reconcile cycles?
Doesn't this pose the problem that if a taskrun depended on a conflicted one, with a runOn(failure), it would then get execute, instead of being skipped too?
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
e5b9d62
to
be80e52
Compare
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
its ready for review, will add support for skip and always after this PR is merged 😲 🤞 |
The following is the coverage report on pkg/.
|
This is the first subset of changes implementing failure strategies. This PR implements `runOn` with two states `success` and `failure`: task4: runOn: task: task1 states: ["success"] task: task2 states: ["failure"] task: task3 states: ["success", "failure"] Execute `task4` if: - task1 was successful AND - task2 was failure AND - task3 was either successful OR failure These changes should work for a pipeline with any number of branches i.e. Pipeline with single branch: Pipeline: task1 | failure task2 (runOn task1 failure) | success task3 (runOn task2 success) Pipeline with multiple branches: Pipeline: task1 success / \ failure task2 task3 By default (without any runOn), Pipeline execution bails out on first task failure. `runOn` confirms the state of the task specified matches with the actual state. `runOn` Success is defined by TaskRunStatus with Condition.Type set to `Succeeded` and Condition.Status set to `True`. `runOn` Failure is defined by TaskRunStatus with Condition.Type set to `Succeeded`, Condition.Status set to `False` and Condition.Reason set to `Failed`. With `runOn`, pipelineRun attempts to execute all the specified tasks. When a parent task does not match the state specified by its dependent task's runOn section, dependent task is skipped and marked with SkippedAsStateConflicted and continues executing rest of the tasks in pipeline. To clarify, in the above example, if task1 succeeds, task2 is executed since it depends on task1's success state and task3 is marked with SkippedAsStateConflicted but execution would continue in case there are more tasks and pipeline ends successfully with `Completed`. There are still two more states needs to be added as part of runOn, `skip` and `always` as per the design document.
The following is the coverage report on pkg/.
|
/test pull-tekton-pipeline-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this! It looks great :)
I'm taking a bit of time to review the code - so I'll come back with proper feedback soon.
One thing that I feel is missing, but we may add on top, is the ability to have a "finally" type of syntax. With this is kind of possible, but one has to list all possible outcomes of a task in the runOn
, and still there would be no guarantee that it is executed as the pipeline might timeout in the meanwhile.
@@ -68,6 +68,8 @@ const ( | |||
// TaskRunSpecStatusCancelled indicates that the user wants to cancel the task, | |||
// if not already cancelled or terminated | |||
TaskRunSpecStatusCancelled = v1beta1.TaskRunSpecStatusCancelled | |||
// TaskRunSpecStatusSkipped indicates that the task should be skipped and not executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defined here as a generic skip, however looking at lines 251-259, it seems that this state is used for a specific case of skip, i.e. SkippedAsStateConflicted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes, TaskRun
is created for a task which depends on a certain state of a parent but parent ends up with different state, for example, TaskB
depends on failure of TaskA
but TaskA
finishes with success. In this scenario, a taskrun is created for TaskB
with this state, more specifically reason
is assigned this value. This state is very crucial (1) to determine the status of PipelineRun, Completed with a skipped task (2) tasks which are dependent on TaskB
are also marked with the same state instead of failing the whole pipeline
Note these two examples and how runOn
handles them:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see, thanks for the clarification!
I think this calls for a bit of extra dicussion, since the proposed behaviour diverges from what we do today:
- if a task fails, the pipeline is failed. Any task that was running until that point finishes, and no new task is scheduled. Tasks that are not executed have no taskrun associated and are not tracked in the pipelinerun status either
- if a task if not executed because of a
runAfter
condition that is not met, the taskrun is not created and not tracked in the pipelinerun status - if a task does not pass a condition, it is not executed, so no taskrun is created for it. However it is tracked in the pipelinerun status (see example below), because the conditions are part of it. tasks that depend on a task skipped because of a condition are not executed and not tracked in the pipelinerun status
status:
conditions:
- lastTransitionTime: "2020-03-12T21:02:42Z"
message: ConditionChecks failed for Task tekton-random-check-xsz9w-check-reset-dr9hr
in PipelineRun tekton-random-check-xsz9w
reason: ConditionCheckFailed
status: "False"
type: Succeeded
podName: ""
For runOn
, to be consistent with today's behaviour for runAfter
, we should not create a taskrun and not track it in the pipelinerun status.
If we wanted to be consisted with the conditions instead, we could track the evaluation of the runOn in the pipelinerun status, but not create a taskrun.
This said, I'm not necessarily advocating for our current approach. It is straight-forward and fairly readable, but it has a downside: anyone (CLI, Dashboard, human user) trying to understand what happened in the pipeline, need to consult the pipeline definition, infer the DAG and compare with the known states to fully understand what happened. For fairly simple pipelines this is easy to do, but it may not always be so obvious for more complex ones, especially in case of humans looking at API output directly.
One thing that we may consider doing in future is to track all taskruns in the pipelinerun status; if we do so we should for all cases, and I think the change should not happen in this PR.
Something I would not do in any case is to actually submit new taskruns resources to the API only to mark them as skipped. That information should be inferred from the parent object, the pipelinerun.
@dibyom @vdemeester @sbwsg @bobcatfish thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it would be great to discuss this further and get everyone's input 👍
This is certainly different than the way we handle conditional failures today. The biggest difference with runOn
is pipeline continues even after a conflict is discovered.
Today, we stop pipelinerun as soon as a task has failed or condition resulted in failure. In case when a task fails, pipelinerun
would have the taskrun
of the failed task with the failure reason.
tkn pr describe 01-pipelinerun-without-runon
💌 Message
TaskRun 01-pipelinerun-without-runon-task1-ln8zz has failed ("step-fail" exited with code 1 (image: "docker-pullable://ubuntu@sha256:04d48df82c938587820d7b6006f5071dbbffceb7ca01d2814f81857c631d44df"); for logs run: kubectl -n default logs 01-pipelinerun-without-runon-task1-ln8zz-pod-pjgmg -c step-fail)
🗂 Taskruns
NAME TASK NAME STARTED DURATION STATUS
∙ 01-pipelinerun-without-runon-task1-ln8zz task1 13 seconds ago 8 seconds Failed
Now, its possible to retrieve the taskrun
:
tkn tr describe 01-pipelinerun-without-runon-task1-ln8zz
Message
"step-fail" exited with code 1 (image: "docker-pullable://ubuntu@sha256:04d48df82c938587820d7b6006f5071dbbffceb7ca01d2814f81857c631d44df"); for logs run: kubectl -n default logs 01-pipelinerun-without-runon-task1-ln8zz-pod-pjgmg -c step-fail
NAME STATUS
∙ fail Error
Whereas for a task with a condition and when that condition fails, pipelinerun
contains taskrun
for the task:
tkn pr describe pipelinerun-condition-failures
🗂 Taskruns
NAME TASK NAME STARTED DURATION STATUS
∙ pipelinerun-condition-failures-task1-j46vj task1 --- --- Failed(ConditionCheckFailed)
And can not retrieve that taskrun
:
tkn tr describe pipelinerun-condition-failures-task1-j46vj
Error: failed to find taskrun "pipelinerun-condition-failures-task1-j46vj"
But at least taskrun
for the condition exists and can be retrieved:
tkn tr list
NAME STARTED DURATION STATUS
pipelinerun-condition-failures-task1-j46vj-false-conditio-gkllw 8 minutes ago 7 seconds Failed
In case of runOn
, if a taskrun
is listed in the pipelinerun
as conflicted, providing an ability to retrieve that taskrun
would be helpful. But we can certainly drop it out of this PR if needed.
tkn pr describe pipelinerun-runon-conflicted
🌡️ Status
STARTED DURATION STATUS
17 minutes ago 1 minute Succeeded(Completed)
🗂 Taskruns
NAME TASK NAME STARTED DURATION STATUS
∙ pipelinerun-runon-conflicted-task2-s942v task2 16 minutes ago 23 seconds Failed
∙ pipelinerun-runon-conflicted-task3-8mhh9 task3 16 minutes ago --- Failed(SkippedAsStateConflicted)
∙ pipelinerun-runon-conflicted-task1-trxwl task1 17 minutes ago 22 seconds Succeeded
tkn tr describe pipelinerun-runon-conflicted-task3-8mhh9
🌡️ Status
STARTED DURATION STATUS
17 minutes ago --- Failed(SkippedAsStateConflicted)
Message
TaskRun "pipelinerun-runon-conflicted-task3-8mhh9" was skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, we stop pipelinerun as soon as a task has failed or condition resulted in failure. In case when a task fails,
pipelinerun
would have thetaskrun
of the failed task with the failure reason.
What do you mean by "condition resulted in failure"?
If the check in a condition is false, so the condition fails, the associated task is not executed, however the execution of the pipelinerun continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can not retrieve that
taskrun
:tkn tr describe pipelinerun-condition-failures-task1-j46vj Error: failed to find taskrun "pipelinerun-condition-failures-task1-j46vj"
Personally I think this correct, the task was not executed at all and so it was not created.
The information about what happened is available in the pipelinerun.
If we create a taskrun, even as a failed one, it will trigger the taskrun controller, which will emit events and do some work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This said, I'm not necessarily advocating for our current approach. It is straight-forward and fairly readable, but it has a downside: anyone (CLI, Dashboard, human user) trying to understand what happened in the pipeline, need to consult the pipeline definition, infer the DAG and compare with the known states to fully understand what happened. For fairly simple pipelines this is easy to do, but it may not always be so obvious for more complex ones, especially in case of humans looking at API output directly.
@afrittoli I think we need to capture this in an issue and work towards having a status
on PipelineRun
that is self descriptive in some ways (aka making it easier for cli, dashboard, .) to get information without having to query the Pipeline
spec.
cc @bobcatfish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big +1 to tracking the entire state using the pipelinerun status instead of looking up taskruns
// compare the list of tasks in runAfter and in runOn | ||
// runOn should not have any extra task specified outside of runAfter | ||
missing := list.DiffLeft(dependentTasks, pt.RunAfter) | ||
if len(missing) > 0 { | ||
return apis.ErrInvalidValue( | ||
fmt.Sprintf("pipeline task %q has a list of tasks under runOn which does not exists in runAfter: %s", pt.Name, missing), | ||
fmt.Sprintf("spec.tasks[%d].%q.runOn", ptIdx, pt.Name), | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? We could even extend the runAfter syntax to support both an array of strings (like today) and an array of maps:
runAfer["taskA", "taskB"]
or
runAfter["taskA": ['success'], "taskB": ['success', 'failure']]
where the first syntax becomes a shorthand format for the ['success'] case.
Yes, we need support for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thank you for this, it's great work!
I have now reviewed most of the code and I have two main questions that I would like to clarify:
- why do we need to mark tasks as conflicted, run them with failure, instead of skipping them and not returning them as candidates for execution?
- would it be possible to collapse the "runAfter" and "runOn" into one? they are strongly related, "runAfter" is a special case of "runOn"
- name: task3 | ||
taskRef: | ||
Name: successful-task | ||
runAfter: ["task1"] | ||
runOn: | ||
- task: task1 | ||
states: ["failure"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different in any way from:
- name: task3
taskRef:
Name: successful-task
runOn:
- task: task1
states: ["success", "failure"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means that task3 runs if and only if task1 has failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, so the runOn
overrides runAfter
in a way.
The meaning of runAfter to today is basically runOn['success']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, it does, and runAfter
today is runOn
with success
.
// PipelineTaskStateSuccess indicates that a PipelineTask has finished execution successfully | ||
PipelineTaskStateSuccess PipelineTaskState = "success" | ||
// PipelineTaskStateFailure indicates that a PipelineTask has finished execution and has failed | ||
PipelineTaskStateFailure PipelineTaskState = "failure" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure aggregates everything that is not "success" and "unknown", right? i.e. skip (because of condition), cancel, timeout (at task or pipeline level)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and no, failure
is equivalent to failed
, i.e. not success and not unknown. skip
is equivalent to failed
with conditionCheckFailed
. Should we include cancel
or timeout
here in skip
? (skip
is not implemented in this PR)
seen := map[string]struct{}{} | ||
for _, s := range states { | ||
if _, ok := seen[strings.ToLower(s)]; ok { | ||
return apis.ErrMultipleOneOf(path) | ||
} | ||
seen[strings.ToLower(s)] = struct{}{} | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make a library for set handling, I think we've written this kind of code in some many places :D
|
||
candidateTasks, err := dag.GetSchedulable(d, pipelineState.SuccessfulPipelineTaskNames()...) | ||
// Get the next tasks to be executed which doesnt't have any TaskRuns created yet | ||
// ExecutedPipelineTaskNames gets the list of finished tasks (either successful or failed or skipped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the skipped ones does not include those skipped because of a condition?
Having a runOn(["skipped"]) could be a trick to provide a way to express an else
statement for a condition, which is something that we miss today, we need to define to opposite conditions for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was thinking that could be a solution to #1023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, runOn: [skip]
is equivalent to run this task if parent was not executed because of a condition failure i.e.
if condition then
task1
else
task2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, runOn: [skip] might be not a bit loose, at least for handling conditionals. If we had this setup (sorry about the approximated YAML syntax):
task1:
conditions:
- condA
taskElseCondA:
runOn: [ task1, skip ]
task2:
runAfter: [ task1 ]
conditions:
- condB
taskElseCondB:
runOn: [ task2, skip ]
If condA
fails, task1 willl be skipped. As a consequence task2
fill be skipped too.
taskElseCondA
will be executed, but what happens with taskElseCondB
?
Since task2
was skipped, it could be executed. However the intention of the pipeline author there is probaly that taskElseCondB
is executed if condB
is false, which we haven't evaluated.
I think it would be best to have the else
task bound to the conditional rather than the task associated to the conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right task 2 will be skipped. One way to do what you want would be for task2 to say runAfter: [task1: [success, skip]
@@ -466,20 +470,31 @@ func (c *Reconciler) reconcile(ctx context.Context, pr *v1alpha1.PipelineRun) er | |||
if rprt == nil { | |||
continue | |||
} | |||
if rprt.IsRunOnConflicted { | |||
rprt.TaskRun, err = c.createRunOnConflictedTaskRun(rprt, pr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to create the TaskRun here?
This TaskRun does not meet the precondition for being scheduled because it depends on a condition (as defined in runOn) that did not happen. And TaskRuns that depended on this one will never be scheduled because their predecessor in the DAG did not run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered above ⬆️ , once again, we can drop creating taskrun here but would be helpful to have one.
// IsFailurePermitted is driven by the runOn section of the dependent tasks, runOn: state: ["failure"] | ||
// it is set to true if any of the tasks within the pipeline has dependency on this task's failure state | ||
// by default, IsFailurePermitted is set to false | ||
IsFailurePermitted bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to propose a change in how we handle this logic here.
I'm not sure I've thought enough about this, but these are the basic ideas:
- any taskrun is "permitted" to fail, it simply fails
- we include failed taskruns in the list of executed tasks
- we include the filters defined by the
runOn
in the logic that calculates which task should be included in the list of schedulable ones at a certain point in time - we change the logic that decides when a pipelinerun is finished. The following conditions should be met: the method that gets the next schedulable task does not return anything, no taskrun is in a transient state. We need to make sure that we don't meet this condition when for instance taskruns are waiting for a pod to be scheduled, but I think we should be safe there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to propose a change in how we handle this logic here.
I'm not sure I've thought enough about this, but these are the basic ideas:
- any taskrun is "permitted" to fail, it simply fails
not sure if I am following this, on a high level here not any taskrun is permitted to fail, after task is executed and if it fails, pipelinerun determines to exit with failure or continue executing rest of the tasks based on this flag IsFailurePermitted
. This flag is set while parsing the pipelinerun.
- we include failed taskruns in the list of executed tasks
Yes, we do include failed taskruns in the list of executed tasks.
- we include the filters defined by the
runOn
in the logic that calculates which task should be included in the list of schedulable ones at a certain point in time
Yup, based on the states specified in runOn
and actual states of a task, a list of schedulable tasks is generated.
- we change the logic that decides when a pipelinerun is finished. The following conditions should be met: the method that gets the next schedulable task does not return anything, no taskrun is in a transient state. We need to make sure that we don't meet this condition when for instance taskruns are waiting for a pod to be scheduled, but I think we should be safe there
Again, not sure, if I am following this, but the logic which decides if pipeline is finished has changed and now includes, (1) check on failed task if failure is permitted (2) skipped tasks due to conflicts are counted as skipped
// by default, IsFailurePermitted is set to false | ||
IsFailurePermitted bool | ||
// IsRunOnConflicted is set to true for a task with runOn states conflicting with actual states of parent task | ||
IsRunOnConflicted bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely convinced that we need to store this state here.
Could the function that returns the list of schedulable TaskRuns use the filters defined in runOn and the state of TaskRuns executed until now to decide if a taskrun should be schedulable or not?
NIT: the "conflicted" term does not resonate with me, but I'm miserably failing to offer an alternative that sounds good. Perhaps using the opposite - isRunOnSatisfied
- but this does not sound very good to me either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely convinced that we need to store this state here.
Could the function that returns the list of schedulable TaskRuns use the filters defined in runOn and the state of TaskRuns executed until now to decide if a taskrun should be schedulable or not?
Yup, that's exactly how its implemented. A taskrun should be scheduled or not is determined solely using the filters (states) defined in runOn
and actual states of taskruns.
These flags are mainly used to determine (1) create skipped state taskrun if IsRunOnConflicted
is set (2) pipelinerun decides whether to continue when a task fails if IsFailurePermitted
is set for that task. Note here with DAG, a list of schedulable tasks is determined by looking at the N-1 level of the tree whereas IsFailurePermitted
is set for a parent task based on its children runOn
states.
NIT: the "conflicted" term does not resonate with me, but I'm miserably failing to offer an alternative that sounds good. Perhaps using the opposite -
isRunOnSatisfied
- but this does not sound very good to me either.
Haha 😂I am having same issue 😫, cannot come up with better option to signify conflict between a state specified in runOn
vs actual taskrun state.
@@ -129,6 +139,18 @@ func (t ResolvedPipelineRunTask) IsCancelled() bool { | |||
return c.IsFalse() && c.Reason == v1alpha1.TaskRunSpecStatusCancelled | |||
} | |||
|
|||
// isSkippedAsStateConflicted return true only if the taskRun has failed with reason SkippedAsStateConflicted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be semantically more correct to actually skip the taskrun, and not fail it with a "skipped" status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a way to signify that a parent task was skipped and hence its children should be skipped too and the task is visited and was skipped, and not pending for execution.
@@ -150,17 +169,67 @@ func (state PipelineRunState) IsDone() (isDone bool) { | |||
// GetNextTasks will return the next ResolvedPipelineRunTasks to execute, which are the ones in the | |||
// list of candidateTasks which aren't yet indicated in state to be running. | |||
func (state PipelineRunState) GetNextTasks(candidateTasks map[string]struct{}) []*ResolvedPipelineRunTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented this already, but instead of marking the TaskRun as conflicted, could we simply not return it here? Is the "conflicted" status used to avoid having to match the runOn filters in various reconcile cycles?
Doesn't this pose the problem that if a taskrun depended on a conflicted one, with a runOn(failure), it would then get execute, instead of being skipped too?
Thank you @afrittoli, very excited for this, its been fun!
Because we want to continue with an execution until there are no tasks left, and the biggest reason being, from an audit perspective, it would be convenient to know what happened to a task, was task executed, not executed, if not then why not. Also, to design a generic solution for simple use cases with single branch and more complex with multiple branches, for example,
If
Sure, we can, I am open to collapsing both, like @vdemeester also suggested. |
@afrittoli thanks a ton for detailed review, I think 🤔I have responded to all of your comments and conclude it with the following two items which might need further discussion:
|
Without registering |
Hey @pritidesai and all! I've come into this late and I have to admit it's a bit hard to follow what's going on at this point 😅 Which to me points toward us needing to do a bit more to flesh out the design before we would merge this entirely. The biggest question on my mind is:
I've tried to identify a few themes in the discussions so far:
I suggest we try to tease these apart and tackle them individually:
My recommendations for next steps:
We're going to be having a beta/workflow meeting on monday at 12pm EST - if you were able to attend @pritidesai we could discuss in more detail then, but if not let's find another time when we could meet and work out the details, if that works for you? If not then writing this down clearly will be more important than ever Anyway that's my 2 cents and again I am SO EXCITED to get this in - let's figure out what incremental steps we can start making so we can start merging PRs and not leave this big one open for a long time. ❤️ |
👍
Yup, we need a discussion on these topics, sooner the better.
I have no strong preference for one over the other. The changes in this PR has it implemented such that If we change it to not create One example here is:
Let's say, 😢 yup, there is an AND between
We could simplify by going back to list but will loose an ability to support this same example 😜:
this means, execute this task if both
humm 🤔,
Yup, I have it implemented but not added in this PR, its already becoming complicated 😫
Yup, syntax is the key, lets nail it down first. This PR has doc here:
Yup, I will see you during beta/working meeting on Monday 😄
Thanks a bunch @bobcatfish 😍, yup super excited 💃 |
I feel like creating a separate PR without creating taskRun for skipped tasks before we meet on Monday to better understand the changes, hopefully 🤞I can get it through with my three kids home 👶👶👧 |
+1
Personally I really feel that My preferred starting point for failure strategies would be to have the failure strategy attached to the task that fails:
My point on this was that we don't explicitly track skipped taskruns today, we simply do not run them, and the initial implementation of
|
/hold |
Closing this for now as we are revisiting the design, see #1684 (comment) for discussion! |
Changes
This is the first subset of changes implementing failure strategies.
This PR implements
runOn
with two statessuccess
andfailure
:Execute
task4
if:Pipeline Status with these changes: Succeeded (Completed)
Pipeline Status: Succeeded and Tasks Completed: 3, Skipped: 0, Failed: 1
These changes should work for a pipeline with any number of branches i.e.
Pipeline with single branch:
Pipeline with multiple branches:
synonym to:
Pipeline Status with these changes: Succeeded (Completed)
Pipeline Status: Succeeded and Tasks Completed: 2, Skipped: 1, Failed: 0
By default (without any runOn), Pipeline execution fails on very first task failure.
But
runOn
confirms the state of the task specified with the actual state.runOn
Success is defined by TaskRunStatus withCondition.Type
set toSucceeded
andCondition.Status
set toTrue
.runOn
Failure is defined by TaskRunStatus withCondition.Type
set toSucceeded
,Condition.Status
set toFalse
and Condition.Reason set toFailed
.With
runOn
, pipelineRun attempts to execute all the specified tasks. When a parent task does not match the state specified by its dependent task's runOn section, dependent task is marked as conflicted and continues executing rest of the pipeline. To clarify, in the above example, if task1 succeeds, task2 is executed since it depends on task1's success state and task3 is skipped and marked withSkippedAsStateConflicted
but execution continues if there are more tasks and pipeline ends Successfully withCompleted
.Note: There are still two more states needs to be added as part of runOn,
skip
andalways
as per the design document.Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.
Release Notes