Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

access execution status of any task in finally #3390

Merged
merged 1 commit into from
Jan 12, 2021

Conversation

pritidesai
Copy link
Member

@pritidesai pritidesai commented Oct 15, 2020

Changes

Introducing a variable which can be used to access the execution status of any pipelineTask in a pipeline. This can be used in any finally task (and limited to finally tasks for the first iteration).

Use $(tasks.<pipelineTask>.status) as param value which resolves to the status, one of, Succeeded, Failed, and None.

E.g.:

    finally:
    - name: finaltask
      params:
        - name: task1Status
          value: "$(tasks.task1.status)"
      taskSpec:
        params:
          - name: task1Status
        steps:
          - image: ubuntu
            name: print-task-status
            script: |
              if [ $(params.task1Status) == "Failed" ]
              then
                echo "Task1 has failed, now process this failure"
              fi

Partially Closes #1020
Implements TEP #0028

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Introducing a variable $(tasks.<taskName>.status) to access execution status of any non finally pipelineTask in finally.

/kind feature

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 15, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/apply.go 91.5% 92.3% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 86.8% 82.3% -4.5

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/apply.go 91.5% 92.3% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 69.9% 72.5% 2.5

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/apply.go 91.5% 92.3% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 69.9% 72.5% 2.5

@bobcatfish
Copy link
Collaborator

bobcatfish commented Oct 15, 2020

hey @pritidesai !! I'm very excited to see this moving forward :D

Do you think this might be worth a TEP to explore the problem and the alternatives a bit?

For example the mapping between the "execution states" and "status" you're proposing (#1020 (comment)) is interesting - it's also different from what you currently see in the status field of an executing TaskRun/PipelineRun so I think it might be worth contrasting this with some different approaches, e.g.:

  1. Exposing the entire status of a Run as metadata, using something like json path to extract values (i think we've looked at this before and it's pretty verbose but i think it's worth considering)
  2. Exposing just the execution status information, but mirroring the structure in the actual status of the run (e.g. instead of context.pipelineRun.Tasks.Foo, context.pipelineRun.Tasks.status.conditions <-- we could return either Unknown, True, False, `` or make them slightly more readable Running, `Succeeded`, `Failed`, `None` **edited - i had thought `.Succeeded` was the complete path but now i realize this is a value!

Looking at the docs on the execution status it feels like this proposal is making the trying to make the "reason" field available to users, and treating that as the overall status (and mixing 'skipped' in as well)

Unknown: no information available on that pipelineTask execution

This one in particular is confusing: "unknown" as a condition status currently means "running" but we are using "unknown" to mean "not set" (i think?)

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/apply.go 91.5% 92.3% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 69.9% 72.5% 2.5

@bobcatfish
Copy link
Collaborator

One more thing that might be worth exploring: some of these statuses won't have meaning or be available. For example:

  • Failed - only available in Finally Tasks (all other Tasks will stop executing)
  • TaskRunCancelled - i think that even finally tasks are not executed if a task is cancelled, so i dont think realistically that this value will ever be present?

For all of these, the values will only be available to tasks that execute during or after the task that is trying to use the values

@pritidesai
Copy link
Member Author

pritidesai commented Oct 15, 2020

hey @pritidesai !! I'm very excited to see this moving forward :D

🎉 🎉 🎉

Do you think this might be worth a TEP to explore the problem and the alternatives a bit?

Yup, definitely 👍

For example the mapping between the "execution states" and "status" you're proposing (#1020 (comment)) is interesting - it's also different from what you currently see in the status field of an executing TaskRun/PipelineRun so I think it might be worth contrasting this with some different approaches, e.g.:

The use cases we have here are:

  • UC1: finally task must be able to check the execution state of any DAG task.
  • UC2: A DAG task must be able to check if other DAG task was skipped. The reason to include only skipped here, like you mentioned in the comment above ⬆️ , if any of the DAG tasks is failed, pipeline starts executing finally tasks and does not schedule rest of the DAG tasks. I need to confirm the cancelled scenario.

The proposed list of states here are derived based on ResolvedPipelineTask.TaskRun.Status.GetCondition(apis.ConditionSucceeded) i.e. Succeeded ConditionType in addition to Status and Reason. The same methods that we use to get the execution queue in pipelinerun.go. The state is also derived based on pipelineRun.Status.SkippedTasks for tasks which does not have any taskRun associated with them and to distinguish them from not yet scheduled tasks.

  • Succeeded: a pipelineTask was successful, a respective pod was created and completed successfully. Condition status is set to true.

  • Failed: a pipelineTask failed, a respective pod was created but exited with error. Condition status is set to false and have exhausted all the retries.

  • TaskRunCancelled: taskRun for that pipelineTask is cancelled by the user. Condition status is set to false and the reason is set to TaskRunCancelled.

  • Skipped: a pipelineTask is skipped because of the when expression or its parent was skipped.

  • Unknown: this is very crucial and ambiguous, this is more like none of the above. pipelineTask was not skipped but did not have any taskRun associated with it. No information on whether this task is going to be scheduled next or will be left unscheduled due to a DAG failure.

  1. Exposing the entire status of a Run as metadata, using something like json path to extract values (i think we've looked at this before and it's pretty verbose but i think it's worth considering)

👍 This can be next in line and can enhance the existing proposal.

The existing proposal brings quick access to the state without investing into json path.

  1. Exposing just the execution status information, but mirroring the structure in the actual status of the run (e.g. instead of context.pipelineRun.Tasks.Foo.Succeeded, context.pipelineRun.Tasks.status.conditions.Succeeded <-- we could return either Unknown, True, False, ``

context.pipelineRun.Tasks.status.conditions.Succeeded with Unknown, True, False would not bring much value since False can mean failed or cancelled which is still fine but will be difficult to capture skipped.

or make them slightly more readable Running, Succeeded, Failed, None

Definitely we can rename the states to Running, Succeeded, Failed, None making them more readable. And add Skipped?

Looking at the docs on the execution status it feels like this proposal is making the trying to make the "reason" field available to users, and treating that as the overall status (and mixing 'skipped' in as well)

Yup, so that its not hidden in None or Unknown and the core focus is access the state of a pipelineTask. the docs on the execution status is more at the pipeline level rather than at individual task level. pipelineTask has extra state (skipped) which is not applicable for a pipeline. A pipeline does not have skipped state.

Unknown: no information available on that pipelineTask execution

This one in particular is confusing: "unknown" as a condition status currently means "running" but we are using "unknown" to mean "not set" (i think?)

It means there is absolutely no way to determine what is the status at that point in time. We could introduce Running state which is derivable but unknown could mean none of the above proposed state.

Within finally, a DAG task would have this state when one of the predecessors fails and pipeline stops executing any new DAG task.

Within DAG, this state wouldn't be returned unless a state of a child is requested even before a parent is executed or finished executing. When a parent is executing, we have no idea whats going to happen to that parent and if that child be ever scheduled/executed.

So worth a TEP 😉

@pritidesai
Copy link
Member Author

I need to confirm the cancelled scenario.

When any DAG task is cancelled, pipeline triggers finally tasks.

@pritidesai
Copy link
Member Author

Do you think this might be worth a TEP to explore the problem and the alternatives a bit?

tektoncd/community#234

@bobcatfish
Copy link
Collaborator

The proposed list of states here are derived based on ResolvedPipelineTask.TaskRun.Status.GetCondition(apis.ConditionSucceeded) i.e. Succeeded ConditionType in addition to Status and Reason. The same methods that we use to get the execution queue in pipelinerun.go. The state is also derived based on pipelineRun.Status.SkippedTasks for tasks which does not have any taskRun associated with them and to distinguish them from not yet scheduled tasks.

Thanks for the detailed explanation! I'm going to take my comments mostly into the TEP; in general I'm wondering if there is a simpler subset of these values we could start with initially, and maybe to keep things simpler, start with just the expected values of the Succeeded ConditionType, then we can be deliberate about adding more if needed.

context.pipelineRun.Tasks.status.conditions.Succeeded with Unknown, True, False would not bring much value since False can mean failed or cancelled which is still fine but will be difficult to capture skipped.

At the moment I don't think there's a scenario where a task is cancelled and other tasks will start to execute after that. If there is later on, we could add additional information to the variable replacement to cover it.

A pipeline does not have skipped state.

Oh yeah great point! imo this is even more reason not to include this as one of the possible values of the new field (more in my TEP comments!)

When any DAG task is cancelled, pipeline triggers finally tasks.

Oh, do you mean when a taskrun that is part of a pipelinerun is cancelled? I was thinking of when the entire PipelineRun is cancelled.

I guess I could see someone doing that but it seems like a strange scenario to support - but in the TEP it looks like "cancelled" isnt being proposed initially so maybe we can cross that bridge when we come to it :D

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2020
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 28, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/substitution/substitution.go 43.8% 40.0% -3.8

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/substitution/substitution.go 43.8% 40.0% -3.8

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/pipeline_validation.go 99.6% 99.6% 0.0
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 64.1% 65.5% 1.4
pkg/substitution/substitution.go 43.8% 48.1% 4.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/pipeline_validation.go 99.6% 99.6% 0.0
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 64.1% 65.5% 1.4
pkg/substitution/substitution.go 43.8% 48.1% 4.3

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/pipeline_validation.go 99.6% 99.6% 0.0
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 64.1% 65.5% 1.4
pkg/substitution/substitution.go 43.8% 48.1% 4.3

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pritidesai !

This looks good, the only problem left that I see is the parameter variable validation, which makes the assumption that the parameter is in the format "$(var)" only.

I would prefer not checking explicitly for cancel / timeout, since that code will never be reached anyways and we have no tests for it. It should be a quick fix if you want to update the PR it can still be in v0.20.0

s = v1beta1.TaskRunReasonSuccessful.String()
case t.IsFailure():
s = v1beta1.TaskRunReasonFailed.String()
case t.IsCancelled():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancelled is a special case of failed. IsCancelled returns true only if the condition is false (==failure) and the reason is a special one. There may be a case of tasks for which cancelled has been requested but they have not been cancelled yet, but those are not covered by the IsCancelled check, as it should be - for them we would return None.

s = v1beta1.TaskRunReasonFailed.String()
case t.IsCancelled():
s = v1beta1.TaskRunReasonFailed.String()
case t.TaskRun != nil && t.TaskRun.HasTimedOut(ctx):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout should be a special case of failed. A Task that has timed out is failed.

This check does not verify that the condition is set to false, only that the task is timed out, so it includes task yet not completed in the sense that the controller has not enforced the timeout yet.

I think we should only read the condition, and return false is false, true if true or none otherwise.

@@ -60,6 +60,22 @@ func ValidateVariableP(value, prefix string, vars sets.String) *apis.FieldError
return nil
}

func ValidateVariablePS(value, prefix string, suffix string, vars sets.String) *apis.FieldError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: it would be nice to have a comment that describes what the function does (here and for several other functions in this module).

}
}
for _, paramValue := range paramValues {
if strings.HasPrefix(stripVarSubExpression(paramValue), "tasks.") && strings.HasSuffix(stripVarSubExpression(paramValue), ".status") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: we don't need to invoke stripVarSubExpression twice here.

@@ -60,6 +60,22 @@ func ValidateVariableP(value, prefix string, vars sets.String) *apis.FieldError
return nil
}

func ValidateVariablePS(value, prefix string, suffix string, vars sets.String) *apis.FieldError {
if vs, present := extractVariablesFromString(value, prefix); present {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pritidesai - I do not see the check in this function - perhaps you refer to https://github.com/tektoncd/pipeline/pull/3390/files#diff-78ff878240189b20e0ef82f2912c93c0b3c7b445865e04fa9b98aab851be6130R321 (L321)? However that check will only work for parameters in the "$(var)" format, but not if extra text or more variables are included.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2021
Introducing a variable which can be used to access the execution
status of any pipelineTask in a pipeline. Use
$(tasks.<pipelineTask>.status) as param value which
contains the status, one of, Succeeded, Failed, or None.
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/pipeline_validation.go 99.6% 99.6% 0.0
pkg/reconciler/pipelinerun/pipelinerun.go 82.0% 82.1% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 92.0% 91.7% -0.3
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 64.1% 66.3% 2.2
pkg/substitution/substitution.go 43.8% 48.1% 4.3

@pritidesai
Copy link
Member Author

thanks @afrittoli, I have created issue on validation as per our slack conversation and updated the checks to check for successful condition type with status true and false instead of explicitly checking for cancelled or time out. Are there any changes needed for this to make it to 0.20?

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates and the patience!
It's great to have this new feature in Tekton :)
/lgtm

@@ -87,6 +87,17 @@ func ApplyTaskResults(targets PipelineRunState, resolvedResultRefs ResolvedResul
}
}

//ApplyPipelineTaskContext replaces context variables referring to execution status with the specified status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: missing space // Apply...

@@ -132,6 +132,18 @@ func (t ResolvedPipelineRunTask) IsStarted() bool {
return t.TaskRun != nil && t.TaskRun.Status.GetCondition(apis.ConditionSucceeded) != nil
}

// IsConditionStatusFalse returns true when a task has succeeded condition with status set to false
// it includes task failed after retries are exhausted, cancelled tasks, and time outs
func (t ResolvedPipelineRunTask) IsConditionStatusFalse() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

@@ -60,6 +60,22 @@ func ValidateVariableP(value, prefix string, vars sets.String) *apis.FieldError
return nil
}

func ValidateVariablePS(value, prefix string, suffix string, vars sets.String) *apis.FieldError {
if vs, present := extractVariablesFromString(value, prefix); present {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I reasonable fix here could be to have a variation of extractVariablesFromString that returns the stripped suffix so that it could be verified.
We should follow up on this PR to had such fix to make sure the code does not break when we add support for results like @GregDritschler mentioned.

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2021
@tekton-robot tekton-robot merged commit 23b37ac into tektoncd:master Jan 12, 2021
@riceluxs1t
Copy link

@pritidesai

$(tasks..status) How would a PipelineRun that has its Pipeline/Task specs fully inlined access these parameters? I believe for a fully inlined PipelineRun, tasks get random suffixes attached to their names.

@pritidesai
Copy link
Member Author

pritidesai commented Jan 26, 2021

@riceluxs1t please provide an example YAML if possible.

$(tasks.<pipelineTask>.status) represents execution status of a pipelineTask under tasks section and is only accessible in finally tasks. For pipelineRun with inlined pipeline/task spec, these parameters stays the same as the pipelineTask names are specified by the users and are not updated by the controller.

Here is an example of pipelineRun with inlined specification: https://github.com/tektoncd/pipeline/blob/master/examples/v1beta1/pipelineruns/pipelinerun-task-execution-status.yaml

I believe for a fully inlined PipelineRun, tasks get random suffixes attached to their names.

Are you seeing suffixes getting attached to pipelineTask names?

jerop added a commit to jerop/community that referenced this pull request Jun 3, 2021
jerop added a commit to jerop/community that referenced this pull request Jun 3, 2021
tekton-robot pushed a commit to tektoncd/community that referenced this pull request Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose pipeline run metadata to Tasks and Conditions
7 participants