Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - failure strategies using runOn 🎉 #2094

Closed
wants to merge 1 commit into from

Conversation

pritidesai
Copy link
Member

@pritidesai pritidesai commented Feb 24, 2020

Changes

This is the first subset of changes implementing failure strategies.

This PR implements runOn with two states success and failure:

task4:
    runOn:
        task: task1
        states: ["success"]
        task: task2
        states: ["failure"]
        task: task3
        states: ["success", "failure"]

Execute task4 if:

  • task1 was successful AND
  • task2 was failure AND
  • task3 was either successful OR failure

Pipeline Status with these changes: Succeeded (Completed)
Pipeline Status: Succeeded and Tasks Completed: 3, Skipped: 0, Failed: 1

These changes should work for a pipeline with any number of branches i.e.

Pipeline with single branch:

 Pipeline:
         task1
           | failure
         task2 (runOn task1 failure)
           | success
         task3 (runOn task2 success)

Pipeline with multiple branches:

 Pipeline:
             task1
    success /     \ failure
         task2   task3

synonym to:

if task1:
  task2
else
  task3

Pipeline Status with these changes: Succeeded (Completed)
Pipeline Status: Succeeded and Tasks Completed: 2, Skipped: 1, Failed: 0

By default (without any runOn), Pipeline execution fails on very first task failure.

But runOn confirms the state of the task specified with the actual state.

runOn Success is defined by TaskRunStatus with Condition.Type set to Succeeded and Condition.Status set to True.
runOn Failure is defined by TaskRunStatus with Condition.Type set to Succeeded, Condition.Status set to False and Condition.Reason set to Failed.

With runOn, pipelineRun attempts to execute all the specified tasks. When a parent task does not match the state specified by its dependent task's runOn section, dependent task is marked as conflicted and continues executing rest of the pipeline. To clarify, in the above example, if task1 succeeds, task2 is executed since it depends on task1's success state and task3 is skipped and marked with SkippedAsStateConflicted but execution continues if there are more tasks and pipeline ends Successfully with Completed.

Note: There are still two more states needs to be added as part of runOn, skip and always as per the design document.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Adds the ability to be able to specify Tasks to run when other Tasks have failed

@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Feb 24, 2020
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dlorenc
You can assign the PR to them by writing /assign @dlorenc in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 24, 2020
@pritidesai pritidesai changed the title failure strategies using runOn 🎉 WIP - failure strategies using runOn 🎉 Feb 24, 2020
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0


// RunOn declares a list of dependent tasks and their states that they have to be in to run this task.
// +optional
RunOn []PipelineTaskRunOn `json:"runOn,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new field runOn as part of PipelineTask

Task string `json:"task"`
// States is a list of states of a PipelineTask which is specified in the Task above
States []PipelineTaskState `json:"states"`
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runOn is a map with key as a task name and value as a list of states

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a good time to bikeshed on the runOn format i.e. map vs list. And I think @skaegi has some ideas around here as well :)

PipelineTaskStateSuccess PipelineTaskState = "success"
// PipelineTaskStateFailure indicates that a PipelineTask has finished execution and has failed
PipelineTaskStateFailure PipelineTaskState = "failure"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding two possible states to begin with: (1) success (2) failure, remaining two states (skip and always) will go here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other states we'd need are 1. skip for condition failure 2. skip due to a previously failing taskrun

Copy link
Member Author

@pritidesai pritidesai Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm, skip for sure, and may be always?

skip would run when parent tasks skipping due to condition failure.

Not sure what (2) signify? By default, runOn task is skipped if the states conflicts and that task is marked as conflicted (RunOnStateConflicted), any task dependent on task with RunOnStateConflicted is also skipped and marked RunOnStateConflicted instead of failing the whole pipeline, please check pipeline.md for more details.

// Validate PipelineTask runOn and runAfter
if err := validatePipelineTaskRunOn(ps.Tasks); err != nil {
return err
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validating runOn for not having any state other than whats allowed and making sure doesnt have duplicates

// GetSchedulable parses the whole graph and get a list of candidates based on already executed tasks
// e.g. for this Pipeline task1 -> task2 (runAfter task1) -> task3 (runAfter task2)
// GetSchedulable returns task1 followed by task2 (after task1 finishes) followed by task3 (after task2 finishes)
candidateTasks, err := dag.GetSchedulable(d, pipelineState.ExecutedPipelineTaskNames()...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was little tricky to change, SuccessfulPipelineTaskNames returned list of tasks which were finished executing successfully. Now, with runOn failed tasks should also be included in the returned list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps pipelineState.CompletedPipelineTaskNames() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dibyom for suggestion, replaced with CompletedPipelineTaskNames

@@ -150,17 +169,67 @@ func (state PipelineRunState) IsDone() (isDone bool) {
// GetNextTasks will return the next ResolvedPipelineRunTasks to execute, which are the ones in the
// list of candidateTasks which aren't yet indicated in state to be running.
func (state PipelineRunState) GetNextTasks(candidateTasks map[string]struct{}) []*ResolvedPipelineRunTask {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where the most of the changes happened, for a task before adding it to the queue, check if runAfter is specified, check if runOn is specified, match the state in case runOn is specified. Mark the task as conflicted when runOn states does not match with the actual state, etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented this already, but instead of marking the TaskRun as conflicted, could we simply not return it here? Is the "conflicted" status used to avoid having to match the runOn filters in various reconcile cycles?

Doesn't this pose the problem that if a taskrun depended on a conflicted one, with a runOn(failure), it would then get execute, instead of being skipped too?

@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0

@pritidesai pritidesai force-pushed the runon-19 branch 2 times, most recently from e5b9d62 to be80e52 Compare February 25, 2020 18:54
@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha2/pipeline_validation.go 94.2% 95.2% 1.0

@pritidesai
Copy link
Member Author

its ready for review, will add support for skip and always after this PR is merged 😲 🤞

@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha1/taskrun_types.go 90.5% 86.4% -4.1
pkg/apis/pipeline/v1beta1/pipeline_validation.go 93.5% 94.6% 1.1
pkg/apis/pipeline/v1beta1/taskrun_types.go 71.4% 69.0% -2.5

This is the first subset of changes implementing failure strategies.
This PR implements `runOn` with two states `success` and `failure`:

task4:
    runOn:
        task: task1
        states: ["success"]
        task: task2
        states: ["failure"]
        task: task3
        states: ["success", "failure"]

Execute `task4` if:
- task1 was successful AND
- task2 was failure AND
- task3 was either successful OR failure

These changes should work for a pipeline with any number of branches i.e.

Pipeline with single branch:

Pipeline:
         task1
           | failure
         task2 (runOn task1 failure)
           | success
         task3 (runOn task2 success)

Pipeline with multiple branches:

Pipeline:
             task1
    success /     \ failure
         task2   task3

By default (without any runOn), Pipeline execution bails out on first task failure.
`runOn` confirms the state of the task specified matches with the actual state.
`runOn` Success is defined by TaskRunStatus with Condition.Type set to
`Succeeded` and Condition.Status set to `True`.
`runOn` Failure is defined by TaskRunStatus with Condition.Type set to
`Succeeded`, Condition.Status set to `False` and Condition.Reason set to `Failed`.

With `runOn`, pipelineRun attempts to execute all the specified tasks.
When a parent task does not match the state specified by its dependent task's
runOn section, dependent task is skipped and marked with SkippedAsStateConflicted
and continues executing rest of the tasks in pipeline. To clarify, in the above
example, if task1 succeeds, task2 is executed since it depends on task1's success
state and task3 is marked with SkippedAsStateConflicted but execution would continue
in case there are more tasks and pipeline ends successfully with `Completed`.

There are still two more states needs to be added as part of runOn, `skip` and
`always` as per the design document.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on pkg/.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha1/taskrun_types.go 90.5% 90.9% 0.4
pkg/apis/pipeline/v1beta1/pipeline_validation.go 93.5% 94.6% 1.1
pkg/apis/pipeline/v1beta1/taskrun_types.go 71.4% 72.4% 1.0

@pritidesai
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! It looks great :)
I'm taking a bit of time to review the code - so I'll come back with proper feedback soon.

One thing that I feel is missing, but we may add on top, is the ability to have a "finally" type of syntax. With this is kind of possible, but one has to list all possible outcomes of a task in the runOn, and still there would be no guarantee that it is executed as the pipeline might timeout in the meanwhile.

@@ -68,6 +68,8 @@ const (
// TaskRunSpecStatusCancelled indicates that the user wants to cancel the task,
// if not already cancelled or terminated
TaskRunSpecStatusCancelled = v1beta1.TaskRunSpecStatusCancelled
// TaskRunSpecStatusSkipped indicates that the task should be skipped and not executed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined here as a generic skip, however looking at lines 251-259, it seems that this state is used for a specific case of skip, i.e. SkippedAsStateConflicted.

Copy link
Member Author

@pritidesai pritidesai Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, TaskRun is created for a task which depends on a certain state of a parent but parent ends up with different state, for example, TaskB depends on failure of TaskA but TaskA finishes with success. In this scenario, a taskrun is created for TaskB with this state, more specifically reason is assigned this value. This state is very crucial (1) to determine the status of PipelineRun, Completed with a skipped task (2) tasks which are dependent on TaskB are also marked with the same state instead of failing the whole pipeline

Note these two examples and how runOn handles them:

https://github.com/tektoncd/pipeline/blob/64d326f29cee582d7f5a75ae6e9e73fe691cc289/docs/pipelines.md#state-of-a-parent-task-specified-in-runon-section-conflicts-with-the-actual-state

https://github.com/tektoncd/pipeline/blob/64d326f29cee582d7f5a75ae6e9e73fe691cc289/docs/pipelines.md#task-with-multiple-dependencies-based-on-different-states

Copy link
Member

@afrittoli afrittoli Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, thanks for the clarification!

I think this calls for a bit of extra dicussion, since the proposed behaviour diverges from what we do today:

  • if a task fails, the pipeline is failed. Any task that was running until that point finishes, and no new task is scheduled. Tasks that are not executed have no taskrun associated and are not tracked in the pipelinerun status either
  • if a task if not executed because of a runAfter condition that is not met, the taskrun is not created and not tracked in the pipelinerun status
  • if a task does not pass a condition, it is not executed, so no taskrun is created for it. However it is tracked in the pipelinerun status (see example below), because the conditions are part of it. tasks that depend on a task skipped because of a condition are not executed and not tracked in the pipelinerun status
      status:
        conditions:
        - lastTransitionTime: "2020-03-12T21:02:42Z"
          message: ConditionChecks failed for Task tekton-random-check-xsz9w-check-reset-dr9hr
            in PipelineRun tekton-random-check-xsz9w
          reason: ConditionCheckFailed
          status: "False"
          type: Succeeded
        podName: ""

For runOn, to be consistent with today's behaviour for runAfter, we should not create a taskrun and not track it in the pipelinerun status.
If we wanted to be consisted with the conditions instead, we could track the evaluation of the runOn in the pipelinerun status, but not create a taskrun.

This said, I'm not necessarily advocating for our current approach. It is straight-forward and fairly readable, but it has a downside: anyone (CLI, Dashboard, human user) trying to understand what happened in the pipeline, need to consult the pipeline definition, infer the DAG and compare with the known states to fully understand what happened. For fairly simple pipelines this is easy to do, but it may not always be so obvious for more complex ones, especially in case of humans looking at API output directly.

One thing that we may consider doing in future is to track all taskruns in the pipelinerun status; if we do so we should for all cases, and I think the change should not happen in this PR.

Something I would not do in any case is to actually submit new taskruns resources to the API only to mark them as skipped. That information should be inferred from the parent object, the pipelinerun.

@dibyom @vdemeester @sbwsg @bobcatfish thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it would be great to discuss this further and get everyone's input 👍

This is certainly different than the way we handle conditional failures today. The biggest difference with runOn is pipeline continues even after a conflict is discovered.

Today, we stop pipelinerun as soon as a task has failed or condition resulted in failure. In case when a task fails, pipelinerun would have the taskrun of the failed task with the failure reason.

tkn pr describe 01-pipelinerun-without-runon

💌 Message

TaskRun 01-pipelinerun-without-runon-task1-ln8zz has failed ("step-fail" exited with code 1 (image: "docker-pullable://ubuntu@sha256:04d48df82c938587820d7b6006f5071dbbffceb7ca01d2814f81857c631d44df"); for logs run: kubectl -n default logs 01-pipelinerun-without-runon-task1-ln8zz-pod-pjgmg -c step-fail)

🗂  Taskruns

 NAME                                         TASK NAME   STARTED          DURATION    STATUS
 ∙ 01-pipelinerun-without-runon-task1-ln8zz   task1       13 seconds ago   8 seconds   Failed

Now, its possible to retrieve the taskrun:

tkn tr describe 01-pipelinerun-without-runon-task1-ln8zz

Message

"step-fail" exited with code 1 (image: "docker-pullable://ubuntu@sha256:04d48df82c938587820d7b6006f5071dbbffceb7ca01d2814f81857c631d44df"); for logs run: kubectl -n default logs 01-pipelinerun-without-runon-task1-ln8zz-pod-pjgmg -c step-fail

 NAME     STATUS
 ∙ fail   Error

Whereas for a task with a condition and when that condition fails, pipelinerun contains taskrun for the task:

tkn pr describe pipelinerun-condition-failures
🗂  Taskruns

 NAME                                           TASK NAME   STARTED   DURATION   STATUS
 ∙ pipelinerun-condition-failures-task1-j46vj   task1       ---       ---        Failed(ConditionCheckFailed)

And can not retrieve that taskrun:

tkn tr describe pipelinerun-condition-failures-task1-j46vj
Error: failed to find taskrun "pipelinerun-condition-failures-task1-j46vj"

But at least taskrun for the condition exists and can be retrieved:

tkn tr list
NAME                                                              STARTED          DURATION    STATUS
pipelinerun-condition-failures-task1-j46vj-false-conditio-gkllw   8 minutes ago    7 seconds   Failed

In case of runOn, if a taskrun is listed in the pipelinerun as conflicted, providing an ability to retrieve that taskrun would be helpful. But we can certainly drop it out of this PR if needed.

tkn pr describe pipelinerun-runon-conflicted
🌡️  Status

STARTED          DURATION   STATUS
17 minutes ago   1 minute   Succeeded(Completed)

🗂  Taskruns

 NAME                                                            TASK NAME   STARTED          DURATION     STATUS
 ∙ pipelinerun-runon-conflicted-task2-s942v   task2       16 minutes ago   23 seconds   Failed
 ∙ pipelinerun-runon-conflicted-task3-8mhh9   task3       16 minutes ago   ---          Failed(SkippedAsStateConflicted)
 ∙ pipelinerun-runon-conflicted-task1-trxwl   task1       17 minutes ago   22 seconds   Succeeded
 tkn tr describe pipelinerun-runon-conflicted-task3-8mhh9
🌡️  Status

STARTED          DURATION    STATUS
17 minutes ago   ---         Failed(SkippedAsStateConflicted)

Message

TaskRun "pipelinerun-runon-conflicted-task3-8mhh9" was skipped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, we stop pipelinerun as soon as a task has failed or condition resulted in failure. In case when a task fails, pipelinerun would have the taskrun of the failed task with the failure reason.

What do you mean by "condition resulted in failure"?
If the check in a condition is false, so the condition fails, the associated task is not executed, however the execution of the pipelinerun continues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can not retrieve that taskrun:

tkn tr describe pipelinerun-condition-failures-task1-j46vj
Error: failed to find taskrun "pipelinerun-condition-failures-task1-j46vj"

Personally I think this correct, the task was not executed at all and so it was not created.
The information about what happened is available in the pipelinerun.
If we create a taskrun, even as a failed one, it will trigger the taskrun controller, which will emit events and do some work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This said, I'm not necessarily advocating for our current approach. It is straight-forward and fairly readable, but it has a downside: anyone (CLI, Dashboard, human user) trying to understand what happened in the pipeline, need to consult the pipeline definition, infer the DAG and compare with the known states to fully understand what happened. For fairly simple pipelines this is easy to do, but it may not always be so obvious for more complex ones, especially in case of humans looking at API output directly.

@afrittoli I think we need to capture this in an issue and work towards having a status on PipelineRun that is self descriptive in some ways (aka making it easier for cli, dashboard, .) to get information without having to query the Pipeline spec.

cc @bobcatfish

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 to tracking the entire state using the pipelinerun status instead of looking up taskruns

Comment on lines +331 to +340
// compare the list of tasks in runAfter and in runOn
// runOn should not have any extra task specified outside of runAfter
missing := list.DiffLeft(dependentTasks, pt.RunAfter)
if len(missing) > 0 {
return apis.ErrInvalidValue(
fmt.Sprintf("pipeline task %q has a list of tasks under runOn which does not exists in runAfter: %s", pt.Name, missing),
fmt.Sprintf("spec.tasks[%d].%q.runOn", ptIdx, pt.Name),
)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? We could even extend the runAfter syntax to support both an array of strings (like today) and an array of maps:

runAfer["taskA", "taskB"]

or

runAfter["taskA": ['success'], "taskB": ['success', 'failure']]

where the first syntax becomes a shorthand format for the ['success'] case.

@pritidesai
Copy link
Member Author

Thank you for this! It looks great :)
I'm taking a bit of time to review the code - so I'll come back with proper feedback soon.

One thing that I feel is missing, but we may add on top, is the ability to have a "finally" type of syntax. With this is kind of possible, but one has to list all possible outcomes of a task in the runOn, and still there would be no guarantee that it is executed as the pipeline might timeout in the meanwhile.

Yes, we need support for finally type of syntax which I foresee as a bool field in PipelineTask. finally needs more discussion and would be ideal to add after we add skip support as part of runOn.

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thank you for this, it's great work!

I have now reviewed most of the code and I have two main questions that I would like to clarify:

  • why do we need to mark tasks as conflicted, run them with failure, instead of skipping them and not returning them as candidates for execution?
  • would it be possible to collapse the "runAfter" and "runOn" into one? they are strongly related, "runAfter" is a special case of "runOn"

Comment on lines +701 to +707
- name: task3
taskRef:
Name: successful-task
runAfter: ["task1"]
runOn:
- task: task1
states: ["failure"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different in any way from:

   - name: task3
      taskRef:
        Name: successful-task
      runOn:
        - task: task1
          states: ["success", "failure"]

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that task3 runs if and only if task1 has failed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, so the runOn overrides runAfter in a way.
The meaning of runAfter to today is basically runOn['success']

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, it does, and runAfter today is runOn with success.

// PipelineTaskStateSuccess indicates that a PipelineTask has finished execution successfully
PipelineTaskStateSuccess PipelineTaskState = "success"
// PipelineTaskStateFailure indicates that a PipelineTask has finished execution and has failed
PipelineTaskStateFailure PipelineTaskState = "failure"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure aggregates everything that is not "success" and "unknown", right? i.e. skip (because of condition), cancel, timeout (at task or pipeline level)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and no, failure is equivalent to failed, i.e. not success and not unknown. skip is equivalent to failed with conditionCheckFailed. Should we include cancel or timeout here in skip? (skip is not implemented in this PR)

Comment on lines +345 to +352
seen := map[string]struct{}{}
for _, s := range states {
if _, ok := seen[strings.ToLower(s)]; ok {
return apis.ErrMultipleOneOf(path)
}
seen[strings.ToLower(s)] = struct{}{}
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make a library for set handling, I think we've written this kind of code in some many places :D


candidateTasks, err := dag.GetSchedulable(d, pipelineState.SuccessfulPipelineTaskNames()...)
// Get the next tasks to be executed which doesnt't have any TaskRuns created yet
// ExecutedPipelineTaskNames gets the list of finished tasks (either successful or failed or skipped)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the skipped ones does not include those skipped because of a condition?
Having a runOn(["skipped"]) could be a trick to provide a way to express an else statement for a condition, which is something that we miss today, we need to define to opposite conditions for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I was thinking that could be a solution to #1023

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, runOn: [skip] is equivalent to run this task if parent was not executed because of a condition failure i.e.

if condition then
    task1
else
    task2

Copy link
Member

@afrittoli afrittoli Mar 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, runOn: [skip] might be not a bit loose, at least for handling conditionals. If we had this setup (sorry about the approximated YAML syntax):

task1:
  conditions:
    - condA

taskElseCondA:
    runOn: [ task1, skip ]

task2:
  runAfter: [ task1 ]
  conditions:
    - condB

taskElseCondB:
    runOn: [ task2, skip ]

If condA fails, task1 willl be skipped. As a consequence task2 fill be skipped too.
taskElseCondA will be executed, but what happens with taskElseCondB?
Since task2 was skipped, it could be executed. However the intention of the pipeline author there is probaly that taskElseCondB is executed if condB is false, which we haven't evaluated.

I think it would be best to have the else task bound to the conditional rather than the task associated to the conditional.

Copy link
Member

@dibyom dibyom Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right task 2 will be skipped. One way to do what you want would be for task2 to say runAfter: [task1: [success, skip]

@@ -466,20 +470,31 @@ func (c *Reconciler) reconcile(ctx context.Context, pr *v1alpha1.PipelineRun) er
if rprt == nil {
continue
}
if rprt.IsRunOnConflicted {
rprt.TaskRun, err = c.createRunOnConflictedTaskRun(rprt, pr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to create the TaskRun here?
This TaskRun does not meet the precondition for being scheduled because it depends on a condition (as defined in runOn) that did not happen. And TaskRuns that depended on this one will never be scheduled because their predecessor in the DAG did not run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above ⬆️ , once again, we can drop creating taskrun here but would be helpful to have one.

// IsFailurePermitted is driven by the runOn section of the dependent tasks, runOn: state: ["failure"]
// it is set to true if any of the tasks within the pipeline has dependency on this task's failure state
// by default, IsFailurePermitted is set to false
IsFailurePermitted bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose a change in how we handle this logic here.
I'm not sure I've thought enough about this, but these are the basic ideas:

  • any taskrun is "permitted" to fail, it simply fails
  • we include failed taskruns in the list of executed tasks
  • we include the filters defined by the runOn in the logic that calculates which task should be included in the list of schedulable ones at a certain point in time
  • we change the logic that decides when a pipelinerun is finished. The following conditions should be met: the method that gets the next schedulable task does not return anything, no taskrun is in a transient state. We need to make sure that we don't meet this condition when for instance taskruns are waiting for a pod to be scheduled, but I think we should be safe there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose a change in how we handle this logic here.
I'm not sure I've thought enough about this, but these are the basic ideas:

  • any taskrun is "permitted" to fail, it simply fails

not sure if I am following this, on a high level here not any taskrun is permitted to fail, after task is executed and if it fails, pipelinerun determines to exit with failure or continue executing rest of the tasks based on this flag IsFailurePermitted. This flag is set while parsing the pipelinerun.

  • we include failed taskruns in the list of executed tasks

Yes, we do include failed taskruns in the list of executed tasks.

  • we include the filters defined by the runOn in the logic that calculates which task should be included in the list of schedulable ones at a certain point in time

Yup, based on the states specified in runOn and actual states of a task, a list of schedulable tasks is generated.

  • we change the logic that decides when a pipelinerun is finished. The following conditions should be met: the method that gets the next schedulable task does not return anything, no taskrun is in a transient state. We need to make sure that we don't meet this condition when for instance taskruns are waiting for a pod to be scheduled, but I think we should be safe there

Again, not sure, if I am following this, but the logic which decides if pipeline is finished has changed and now includes, (1) check on failed task if failure is permitted (2) skipped tasks due to conflicts are counted as skipped

// by default, IsFailurePermitted is set to false
IsFailurePermitted bool
// IsRunOnConflicted is set to true for a task with runOn states conflicting with actual states of parent task
IsRunOnConflicted bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely convinced that we need to store this state here.
Could the function that returns the list of schedulable TaskRuns use the filters defined in runOn and the state of TaskRuns executed until now to decide if a taskrun should be schedulable or not?

NIT: the "conflicted" term does not resonate with me, but I'm miserably failing to offer an alternative that sounds good. Perhaps using the opposite - isRunOnSatisfied - but this does not sound very good to me either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely convinced that we need to store this state here.
Could the function that returns the list of schedulable TaskRuns use the filters defined in runOn and the state of TaskRuns executed until now to decide if a taskrun should be schedulable or not?

Yup, that's exactly how its implemented. A taskrun should be scheduled or not is determined solely using the filters (states) defined in runOn and actual states of taskruns.
These flags are mainly used to determine (1) create skipped state taskrun if IsRunOnConflicted is set (2) pipelinerun decides whether to continue when a task fails if IsFailurePermitted is set for that task. Note here with DAG, a list of schedulable tasks is determined by looking at the N-1 level of the tree whereas IsFailurePermitted is set for a parent task based on its children runOn states.

NIT: the "conflicted" term does not resonate with me, but I'm miserably failing to offer an alternative that sounds good. Perhaps using the opposite - isRunOnSatisfied - but this does not sound very good to me either.

Haha 😂I am having same issue 😫, cannot come up with better option to signify conflict between a state specified in runOn vs actual taskrun state.

@@ -129,6 +139,18 @@ func (t ResolvedPipelineRunTask) IsCancelled() bool {
return c.IsFalse() && c.Reason == v1alpha1.TaskRunSpecStatusCancelled
}

// isSkippedAsStateConflicted return true only if the taskRun has failed with reason SkippedAsStateConflicted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be semantically more correct to actually skip the taskrun, and not fail it with a "skipped" status.

Copy link
Member Author

@pritidesai pritidesai Mar 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a way to signify that a parent task was skipped and hence its children should be skipped too and the task is visited and was skipped, and not pending for execution.

@@ -150,17 +169,67 @@ func (state PipelineRunState) IsDone() (isDone bool) {
// GetNextTasks will return the next ResolvedPipelineRunTasks to execute, which are the ones in the
// list of candidateTasks which aren't yet indicated in state to be running.
func (state PipelineRunState) GetNextTasks(candidateTasks map[string]struct{}) []*ResolvedPipelineRunTask {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented this already, but instead of marking the TaskRun as conflicted, could we simply not return it here? Is the "conflicted" status used to avoid having to match the runOn filters in various reconcile cycles?

Doesn't this pose the problem that if a taskrun depended on a conflicted one, with a runOn(failure), it would then get execute, instead of being skipped too?

@pritidesai
Copy link
Member Author

Again, thank you for this, it's great work!

Thank you @afrittoli, very excited for this, its been fun!

I have now reviewed most of the code and I have two main questions that I would like to clarify:

  • why do we need to mark tasks as conflicted, run them with failure, instead of skipping them and not returning them as candidates for execution?

Because we want to continue with an execution until there are no tasks left, and the biggest reason being, from an audit perspective, it would be convenient to know what happened to a task, was task executed, not executed, if not then why not. Also, to design a generic solution for simple use cases with single branch and more complex with multiple branches, for example,

          task1 (finishes with success)
 failure |
            v
          task2

task2 is marked as conflicted and PipelineRun finishes with Completed and 1 skipped task. In this case, we can skip task2 instead of returning it as a candidate for execution. But to apply the same logic to more complex use case is challenging. Complex use case includes Pipeline with two ore more branches, we need a way to distinguish when a task actually failed vs when a conflict happens, for example,

                  task1
    success /     \ failure
         task2   task3
   failure |       | success
         task4   task5

If task1 is successful, task3 and task5 both are marked as conflicted whereas task2 and task4 finishes execution and PipelineRun is marked as Completed with two tasks skipped.

  • would it be possible to collapse the "runAfter" and "runOn" into one? they are strongly related, "runAfter" is a special case of "runOn"

Sure, we can, I am open to collapsing both, like @vdemeester also suggested.

@pritidesai
Copy link
Member Author

pritidesai commented Mar 15, 2020

@afrittoli thanks a ton for detailed review, I think 🤔I have responded to all of your comments and conclude it with the following two items which might need further discussion:

  1. Should runOn derive task execution order in addition to runAfter? like you and @vdemeester suggested here WIP - failure strategies using runOn 🎉 #2094 (comment)

  2. Whether to create a taskrun in case when a task has specified certain runOn states but actual states are different i.e. conflict. If we decide not to create a skipped taskrun i.e. change the way its implemented in this PR, we will need a way to distinguish a task is visited but was skipped vs a task hasn't attempted, this is tricky from code perspective as GetNextTasks function does not have a pointer receiver and that's why PipelineRunState does not convey that message to reconciler, which can be changed but something to note.

@pritidesai
Copy link
Member Author

  1. Whether to create a taskrun in case when a task has specified certain runOn states but actual states are different i.e. conflict. If we decide not to create a skipped taskrun i.e. change the way its implemented in this PR, we will need a way to distinguish a task is visited but was skipped vs a task hasn't attempted, this is tricky from code perspective as GetNextTasks function does not have a pointer receiver and that's why PipelineRunState does not convey that message to reconciler, which can be changed but something to note.

Without registering taskRun in the reconciler for skipped tasks and only having the status in the PipelineRunStatus, retrieving tasks dependent on these skipped tasks is failing (tried couple of options today 😞), I can look into it more after we decide the best way to represent skipped tasks.

@bobcatfish
Copy link
Collaborator

Hey @pritidesai and all! I've come into this late and I have to admit it's a bit hard to follow what's going on at this point 😅 Which to me points toward us needing to do a bit more to flesh out the design before we would merge this entirely.

The biggest question on my mind is:

  • What is the smallest step we can take toward this right now, as we continue to hammer out the details?

I've tried to identify a few themes in the discussions so far:

  1. Creating taskrun objects for tasks that have not executed
  2. The syntax for runAfter + runOn
  3. The "conflicted" state
  4. Handling "skipped" tasks

I suggest we try to tease these apart and tackle them individually:

  • (1) <-- i strongly feel we should NOT be creating taskrun objects for Tasks that are not executing. If the core issue is being able to see the state of all of the tasks in the PipelineRun, I think we can do that by updating the status of the pipelinerun and don't need to create actual taskruns (i think @dibyom has been discussing this a bit already). if the current code is making this hard, maybe a great first step here would be to refactor the code in prepartion? i.e. make it so that the state of completed TaskRuns is available in teh status of the PipelineRun and doesnt need to be retrieved
  • (2) <-- im not sure we should use the map because think it's hard to understand what it means, e.g.:
    image
    Does this mean when task1 is success OR failure AND when task2 is success? or is there an AND between task1 and task2? <-- and this comes back to: what is the simplest thing we can do next?
  • (3) I don't think there should be a conflicted state; not executing tasks that are meant to execute when when something fails is a totally normal and fine thing
  • (4) can we avoid (skip?) handling skip for now and add it later?

My recommendations for next steps:

  • Start doing any refactoring that can be done now
  • Write up 2-4 in more detail, either in the existing design doc, in the pipeline.md in this PR or in brand new docs so we can look athem. I would prioritize (2) b/c nailing down the syntax will unblock some incremental changes

We're going to be having a beta/workflow meeting on monday at 12pm EST - if you were able to attend @pritidesai we could discuss in more detail then, but if not let's find another time when we could meet and work out the details, if that works for you? If not then writing this down clearly will be more important than ever

Anyway that's my 2 cents and again I am SO EXCITED to get this in - let's figure out what incremental steps we can start making so we can start merging PRs and not leave this big one open for a long time. ❤️

@pritidesai
Copy link
Member Author

Hey @pritidesai and all! I've come into this late and I have to admit it's a bit hard to follow what's going on at this point 😅 Which to me points toward us needing to do a bit more to flesh out the design before we would merge this entirely.

👍

The biggest question on my mind is:

  • What is the smallest step we can take toward this right now, as we continue to hammer out the details?

I've tried to identify a few themes in the discussions so far:

  1. Creating taskrun objects for tasks that have not executed
  2. The syntax for runAfter + runOn
  3. The "conflicted" state
  4. Handling "skipped" tasks

Yup, we need a discussion on these topics, sooner the better.

I suggest we try to tease these apart and tackle them individually:

  • (1) <-- i strongly feel we should NOT be creating taskrun objects for Tasks that are not executing. If the core issue is being able to see the state of all of the tasks in the PipelineRun, I think we can do that by updating the status of the pipelinerun and don't need to create actual taskruns (i think @dibyom has been discussing this a bit already). if the current code is making this hard, maybe a great first step here would be to refactor the code in prepartion? i.e. make it so that the state of completed TaskRuns is available in teh status of the PipelineRun and doesnt need to be retrieved

I have no strong preference for one over the other. The changes in this PR has it implemented such that TaskRun does get created for tasks that are not executing. The core issue otherwise is not being able to retrieve a list of tasks to be executed next for which the parent was not executed. The task will be skipped if parent was skipped but we should be able to add that status in PipelineRun status.

If we change it to not create taskRun, the implementation will change in the following area:
GetNextTasks operates on PipelineRunState which holds the entire PipelineRun after resolving tasks, resources, conditions, etc and most importantly TaskRun registered in reconciler (actual running taskRun). In case of runOn with skipped tasks, GetNextTasks would need to read PipelineRun.Status in addition to PipelineRunState. PipelineRunState has runOn section with tasks and states. And PipelineRun.Status holds taskRun object indicating that the task was skipped. They key issue here is identifying when to retrieve a task from PipelineRunState vs PipelineRun.Status. We could dedicate PipelineRun.State to only return skipped tasks and ignore successful/failed tasks. But I feel, it makes the whole process very complicated iterating over two different data structure and making sure we dont include task which was not suppose to 😞

One example here is:

                  task1
    success /     \ failure
         task2     task3
   failure |          | success
         task4     task5

Let's say, task1 was successful which means task3 and task5 should be marked as skipped since it depends on failure of task1. PipelineRunState wouldn't carry taskRun for task3 and task5 vs PipelineRun.Status would have all five tasks and their status.

  • (2) <-- im not sure we should use the map because think it's hard to understand what it means, e.g.:
    image
    Does this mean when task1 is success OR failure AND when task2 is success? or is there an AND between task1 and task2? <-- and this comes back to: what is the simplest thing we can do next?

😢 yup, there is an AND between task1 and task2. It means execute this task:

if (task1 == success OR task1 == failure) AND task2 == success {
....
}

We could simplify by going back to list but will loose an ability to support this same example 😜:

runAfter: ["task1", "task2"]
runOn: ["success", "failure"]

this means, execute this task if both task1 and task2 are either successful or both have failed 🙃

  • (3) I don't think there should be a conflicted state; not executing tasks that are meant to execute when when something fails is a totally normal and fine thing

humm 🤔, conflicted state is to indicate that the task should be skipped since the states didnt match, I think it needs to rename to something more meaningful, conflicted is used internally to create a skip taskRun (actual taskRun or just object).

  • (4) can we avoid (skip?) handling skip for now and add it later?

Yup, I have it implemented but not added in this PR, its already becoming complicated 😫

My recommendations for next steps:

  • Start doing any refactoring that can be done now
  • Write up 2-4 in more detail, either in the existing design doc, in the pipeline.md in this PR or in brand new docs so we can look at hem. I would prioritize (2) b/c nailing down the syntax will unblock some incremental changes

Yup, syntax is the key, lets nail it down first. This PR has doc here:
https://github.com/tektoncd/pipeline/blob/64d326f29cee582d7f5a75ae6e9e73fe691cc289/docs/pipelines.md#failure-strategies
I will either create a separate PR or update existing one with more details.

We're going to be having a beta/workflow meeting on monday at 12pm EST - if you were able to attend @pritidesai we could discuss in more detail then, but if not let's find another time when we could meet and work out the details, if that works for you? If not then writing this down clearly will be more important than ever

Yup, I will see you during beta/working meeting on Monday 😄

Anyway that's my 2 cents and again I am SO EXCITED to get this in - let's figure out what incremental steps we can start making so we can start merging PRs and not leave this big one open for a long time. ❤️

Thanks a bunch @bobcatfish 😍, yup super excited 💃

@pritidesai
Copy link
Member Author

I feel like creating a separate PR without creating taskRun for skipped tasks before we meet on Monday to better understand the changes, hopefully 🤞I can get it through with my three kids home 👶👶👧

@pritidesai
Copy link
Member Author

pritidesai commented Mar 22, 2020

I have one more PR created to avoid creating Task Run here, PR #2261 🏃‍♀ 🏃

It still shows in the PipelineRun Status but actual TaskRun is not created, this example has the details.

@afrittoli
Copy link
Member

I have one more PR created to avoid creating Task Run here, PR #2261 🏃‍♀ 🏃

It still shows in the PipelineRun Status but actual TaskRun is not created, this example has the details.

Thank you! 🙏

@afrittoli
Copy link
Member

afrittoli commented Mar 23, 2020

Hey @pritidesai and all! I've come into this late and I have to admit it's a bit hard to follow what's going on at this point 😅 Which to me points toward us needing to do a bit more to flesh out the design before we would merge this entirely.

👍

The biggest question on my mind is:

  • What is the smallest step we can take toward this right now, as we continue to hammer out the details?

I've tried to identify a few themes in the discussions so far:

  1. Creating taskrun objects for tasks that have not executed
  2. The syntax for runAfter + runOn
  3. The "conflicted" state
  4. Handling "skipped" tasks

Yup, we need a discussion on these topics, sooner the better.

I suggest we try to tease these apart and tackle them individually:

  • (1) <-- i strongly feel we should NOT be creating taskrun objects for Tasks that are not executing. If the core issue is being able to see the state of all of the tasks in the PipelineRun, I think we can do that by updating the status of the pipelinerun and don't need to create actual taskruns (i think @dibyom has been discussing this a bit already). if the current code is making this hard, maybe a great first step here would be to refactor the code in prepartion? i.e. make it so that the state of completed TaskRuns is available in teh status of the PipelineRun and doesnt need to be retrieved

+1

I have no strong preference for one over the other. The changes in this PR has it implemented such that TaskRun does get created for tasks that are not executing. The core issue otherwise is not being able to retrieve a list of tasks to be executed next for which the parent was not executed. The task will be skipped if parent was skipped but we should be able to add that status in PipelineRun status.

If we change it to not create taskRun, the implementation will change in the following area:
GetNextTasks operates on PipelineRunState which holds the entire PipelineRun after resolving tasks, resources, conditions, etc and most importantly TaskRun registered in reconciler (actual running taskRun). In case of runOn with skipped tasks, GetNextTasks would need to read PipelineRun.Status in addition to PipelineRunState. PipelineRunState has runOn section with tasks and states. And PipelineRun.Status holds taskRun object indicating that the task was skipped. They key issue here is identifying when to retrieve a task from PipelineRunState vs PipelineRun.Status. We could dedicate PipelineRun.State to only return skipped tasks and ignore successful/failed tasks. But I feel, it makes the whole process very complicated iterating over two different data structure and making sure we dont include task which was not suppose to 😞

One example here is:

                  task1
    success /     \ failure
         task2     task3
   failure |          | success
         task4     task5

Let's say, task1 was successful which means task3 and task5 should be marked as skipped since it depends on failure of task1. PipelineRunState wouldn't carry taskRun for task3 and task5 vs PipelineRun.Status would have all five tasks and their status.

  • (2) <-- im not sure we should use the map because think it's hard to understand what it means, e.g.:
    image
    Does this mean when task1 is success OR failure AND when task2 is success? or is there an AND between task1 and task2? <-- and this comes back to: what is the simplest thing we can do next?

😢 yup, there is an AND between task1 and task2. It means execute this task:

if (task1 == success OR task1 == failure) AND task2 == success {
....
}

We could simplify by going back to list but will loose an ability to support this same example 😜:

runAfter: ["task1", "task2"]
runOn: ["success", "failure"]

this means, execute this task if both task1 and task2 are either successful or both have failed 🙃

Personally I really feel that runOn is a specialisation of runAfter. If both runOn and runAfter were specified on a task it would not be very intuitive to understand the resulting rule to be applied, so we should only have runAfter and add the ability to add rules to it.

My preferred starting point for failure strategies would be to have the failure strategy attached to the task that fails: Pipeline would define what is the failure behaviour for each Task. i.e. fail or continue (with default == fail like today). The functionality in runOn would them complement that nicely.

  • (3) I don't think there should be a conflicted state; not executing tasks that are meant to execute when when something fails is a totally normal and fine thing

humm 🤔, conflicted state is to indicate that the task should be skipped since the states didnt match, I think it needs to rename to something more meaningful, conflicted is used internally to create a skip taskRun (actual taskRun or just object).

My point on this was that we don't explicitly track skipped taskruns today, we simply do not run them, and the initial implementation of runOn could do the same. We could add tracking of skipped taskruns into the PipelineRun status as a follow-up PR - but we should then do that for all cases, even when runOn is not used at all, and also for skip in case of conditionals.

  • (4) can we avoid (skip?) handling skip for now and add it later?

Yup, I have it implemented but not added in this PR, its already becoming complicated 😫

My recommendations for next steps:

  • Start doing any refactoring that can be done now
  • Write up 2-4 in more detail, either in the existing design doc, in the pipeline.md in this PR or in brand new docs so we can look at hem. I would prioritize (2) b/c nailing down the syntax will unblock some incremental changes

Yup, syntax is the key, lets nail it down first. This PR has doc here:
https://github.com/tektoncd/pipeline/blob/64d326f29cee582d7f5a75ae6e9e73fe691cc289/docs/pipelines.md#failure-strategies
I will either create a separate PR or update existing one with more details.

We're going to be having a beta/workflow meeting on monday at 12pm EST - if you were able to attend @pritidesai we could discuss in more detail then, but if not let's find another time when we could meet and work out the details, if that works for you? If not then writing this down clearly will be more important than ever

Yup, I will see you during beta/working meeting on Monday 😄

Anyway that's my 2 cents and again I am SO EXCITED to get this in - let's figure out what incremental steps we can start making so we can start merging PRs and not leave this big one open for a long time. ❤️

Thanks a bunch @bobcatfish 😍, yup super excited 💃

@pritidesai
Copy link
Member Author

/hold
We are revisiting the syntax of runOn (list vs map) with the combination of runAfter. Will update this PR once we have the changes in.

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2020
@pritidesai pritidesai changed the title failure strategies using runOn 🎉 WIP - failure strategies using runOn 🎉 Mar 24, 2020
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2020
@bobcatfish
Copy link
Collaborator

Closing this for now as we are revisiting the design, see #1684 (comment) for discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants