Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where PipelineRun emits task results that were never produced #3472

Merged
merged 1 commit into from Jan 12, 2021
Merged

Fix bug where PipelineRun emits task results that were never produced #3472

merged 1 commit into from Jan 12, 2021

Conversation

ghost
Copy link

@ghost ghost commented Oct 29, 2020

Changes

Fixes #3465

Prior to this commit a Pipeline Result that references a Task Result of a failed TaskRun would end up with a value of the literal variable. For example: if Task "foo" failed, a Pipeline Result with value of "$(tasks.foo.results.bar)" would end up in the PipelineRun.Status.Results list with the literal value, "$(tasks.foo.results.bar)". It was therefore difficult to assess programatically whether results were populated correctly or were the result of some invalid TaskRun condition.

This commit fixes the bug by filtering out PipelineRun Results that reference failed TaskRuns. It was quite difficult to follow the flow of execution wrt PipelineRun Results and so ultimately I had to refactor the whole lot to figure out where the bug was. The final code is quite a bit shorter than the original and has improved test coverage to more robustly exercise the behaviour of PipelineResults in various failure scenarios.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)
Fix bug where PipelineRun emits task results that were never produced. Previously a Pipeline Result that contained an invalid variable would be added to the PipelineRun with that unreplaced variable intact. Now a Pipeline Result that contains an invalid variable will not be emitted by the PipelineRun at all.

@ghost ghost added the kind/bug Categorizes issue or PR as related to a bug. label Oct 29, 2020
@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 29, 2020
@tekton-robot tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 29, 2020
Comment on lines 577 to 587
If a `Pipeline Result` references a failed `TaskRun` then the `Pipeline Result` is
considered invalid and will not be emitted by the completed `PipelineRun`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result could also be missing because the task was skipped as well...not only when it failed...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also successful taskrun without producing task result 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result could also be missing because the task was skipped as well...not only when it failed...

Ah, good catch, thanks @jerop, updated docs to expand the list of reasons that a pipeline result would be invalid.

Regarding the successful taskrun w/o producing task result, I think that's accounted for - see my other comment.

@pritidesai
Copy link
Member

@sbwsg I haven't looked at the code yet but we might get into a scenario where task could succeed without producing any result and the same literal might end up in pipelineRun results.

@ghost
Copy link
Author

ghost commented Oct 30, 2020

@sbwsg I haven't looked at the code yet but we might get into a scenario where task could succeed without producing any result and the same literal might end up in pipelineRun results.

Thanks for the feedback @pritidesai ! I believe this is accounted for - here's the code that decides on the validity of a PipelineResult's variable reference and a new test case that exercises the behaviour of a PipelineResult referencing a Task Result that is not produced by the TaskRun. The result variable will be considered invalid and the Pipeline Result won't be emitted. LMK if you see edge cases I'm not covering here - I want to account for as many of these rules as possible with unit tests so that we can better catch regressions.

c.updatePipelineResults(ctx, pr, getPipelineFunc)
if _, pipelineSpec, err := resources.GetPipelineData(ctx, pr, getPipelineFunc); err != nil {
logger.Errorf("Failed to get Pipeline Spec to process Pipeline Results: %v", err)
pr.Status.MarkFailed(ReasonCouldntGetPipeline,
Copy link
Author

@ghost ghost Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @pritidesai or @afrittoli - given that we are in a block guarded by pr.IsDone(), is it too late to call pr.Status.MarkFailed()? I hadn't noticed this prior to making these changes but the old code did this too (see updatePipelineResults in the diff).

Copy link
Member

@pritidesai pritidesai Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, pr.IsDone returns true when the pipeline is either successful or failed. markFailed can switch the success to failure with setting its completionTime. This can definitely cause issue since the pipeline would be marked as finished with the completionTime but it's still running.

Its missing call to finishReconcileUpdateEmitEvents which will make sure the pipeline actually finishes and doesnt continue execution after MarkFailed.

		pr.Status.MarkFailed(ReasonCouldntGetPipeline, "Failed to get Pipeline Spec to process Pipeline Results %s/%s: %s", pr.Namespace, pr.Name, err)
		return c.finishReconcileUpdateEmitEvents(ctx, pr, before, nil)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this brings us back to similar discussion with task results, all the tasks executed successfully but a particular pipeline result which was being referenced doesn't exist so declare pipeline failed 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove the call to MarkFailed here I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the call to MarkFailed and now instead am using the controller's Recorder (if available) to record a warning event (in addition to logging an error from the controller).

Copy link
Contributor

@GregDritschler GregDritschler Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, as part of the Run integration with pipelines I tried to have the e2e test verify the pipeline results. I had to take that out because there was a race condition between the test case waking up (upon seeing the PipelineRun was done) and the controller updating the PipelineRun with the results. I think the results should be added to the PR at the same time it is marked done. I'm just bringing this up for your awareness; it's up to you whether you want to tackle it in this PR.

Copy link
Member

@pritidesai pritidesai Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point @GregDritschler

yup, we dont have to tackle everything in this PR, sorry @sbwsg for polluting this PR.

If I am not mistaken, taskRun has the similar behavior where a taskRun is marked as successful along with its completion time before populating task results (might be fixed with #3497):

pipeline/pkg/pod/status.go

Lines 272 to 282 in 05d67e4

func updateCompletedTaskRunStatus(logger *zap.SugaredLogger, trs *v1beta1.TaskRunStatus, pod *corev1.Pod) {
if DidTaskRunFail(pod) {
msg := getFailureMessage(logger, pod)
MarkStatusFailure(trs, msg)
} else {
MarkStatusSuccess(trs)
}
// update tr completed time
trs.CompletionTime = &metav1.Time{Time: time.Now()}
}

pipeline/pkg/pod/status.go

Lines 109 to 113 in 05d67e4

complete := areStepsComplete(pod) || pod.Status.Phase == corev1.PodSucceeded || pod.Status.Phase == corev1.PodFailed
if complete {
updateCompletedTaskRunStatus(logger, trs, pod)
} else {

After a taskRun is marked as successful, taskRun status is updated with task results (through setTaskRunStatusBasedOnStepStatus):

pipeline/pkg/pod/status.go

Lines 162 to 164 in 05d67e4

if tr.IsSuccessful() {
trs.TaskRunResults = append(trs.TaskRunResults, taskResults...)
trs.ResourcesResult = append(trs.ResourcesResult, pipelineResourceResults...)

may be worth a discussion in API WG? 😜

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I'm going to back out the change and re-introduce MarkFailed since it's the existing behaviour and we can discuss further.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - MarkFailed is re-introduced.

- a `TaskRun` referenced by the `Pipeline Result` was skipped
- a `TaskRun` referenced by the `Pipeline Result` didn't emit the referenced `Task Result`
- the `Pipeline Result` uses a variable that doesn't point to a task in the Pipeline
- the `Pipeline Result` uses a variable that doesn't point to a result of a task in the Pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbwsg continuation of #3497 - these all seem like errors to me

  • the Pipeline Result uses a variable that doesn't point to a task in the Pipeline
  • the Pipeline Result uses a variable that doesn't point to a result of a task in the Pipeline

If I'm understanding correctly, these both mean that the Pipeline was created incorrectly - we could catch these at validation time before we start actually executing the Pipeline

  1. a TaskRun referenced by the Pipeline Result failed <-- this would fail the pipelinerun anyway
  2. a TaskRun referenced by the Pipeline Result was skipped
  3. a TaskRun referenced by the Pipeline Result didn't emit the referenced Task Result

These would all be at runtime - I could imagine that someone might be annoyed if their entire Pipeline ran successfully but one of the above caused a pipeline level result not to be set - however I think that could be addressed by a combo of:

  1. Tasks that claim to emit results but don't should fail #3497 <-- case (3) would fail after the Task ran
  2. TEP-0048: task results without results - problem statement community#240 <-- we could include this use case if we want to be able to support defaults in the case of (2) (and maybe 3 if we let a Task specify a default result?)

Copy link
Author

@ghost ghost Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on all points. Validation should catch incorrect variable usage and Tasks that don't emit any described results should fail, causing the Pipeline to also fail.

Regarding the validation, we do have this - it runs during DAG scheduling, so when we decide what to run next we find an invalid result reference and error the PipelineRun with InvalidTaskResultReference.

we could catch these at validation time before we start actually executing the Pipeline

We could push this earlier and validate all Task result references before execution begins but we would need to fetch and inspect every Task referenced by a result variable before starting the Pipeline. Otherwise we don't know what the correct result names are. So it's feasible and do-able. I think we need to decide if that's preferable vs whatever time it costs to fetch all the referenced Tasks and inspect their results before running the Pipeline.

Reading back I am realizing this morning that I was wrong in my comment on #3497. A PipelineRun was never put into a failed state just because a Pipeline Result reference is invalid. A PipelineRun is marked failed if it's unable to fetch the PipelineSpec needed to process Pipeline Results. Otherwise the way it's worked before and after this change is that missing or invalid results don't update the final Success/Fail state of the PipelineRun.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #3499 to explore the possibility and costs of fetching and validating task result references before PipelineRun execution starts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still a bit confused about both what the current behavior is and what the behavior will be after this PR - would it make sense to maybe make this section of the docs also indicate what behavior we expect in each of these casess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and open to merge this since there is no functionality change compared to the existing behavior. And continue this discussion with #3499.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc with more detail about the accompanying PipelineRun state when these conditions are hit.

Value: "$(tasks.pt1.results.foo)",
}, {
Name: "pipeline-result-2",
Value: "$(tasks.pt1.results.foo), $(tasks.pt2.results.baz), $(tasks.pt1.results.bar), $(tasks.pt2.results.baz), $(tasks.pt1.results.foo)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbwsg while your are updating the doc, might be worth mentioning this particular scenario.

I think without these changes (this PR):

"$(tasks.pt1.results.foo), $(tasks.pt2.results.baz), $(tasks.pt1.results.bar), $(tasks.pt2.results.baz), $(tasks.pt1.results.foo)"

if $(tasks.pt2.results.baz) was not initialized, this was resulting in:

"do, $(tasks.pt2.results.baz), mi, $(tasks.pt2.results.baz), do"

(I haven't confirmed yet)

But now with these changes, its not set at all and not part of the results any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup just confirmed, on master, its producing "do, $(tasks.pt2.results.baz), mi, $(tasks.pt2.results.baz), do" 😢

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pritidesai 👍 right, that's the bad behaviour we're aiming to fix here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a Note after the list of reasons that a Pipeline Result will not be emitted. The note talks about a Pipeline Result that references multiple Taskrun Results and makes clear that any invalid reference from the Pipeline Result will mean the entire result is not emitted.

// pointer is returned if the variable is invalid for any reason.
func taskResultValue(variable string, taskStatuses map[string]*v1beta1.PipelineRunTaskRunStatus) *string {
variableParts := strings.Split(variable, ".")
if len(variableParts) != 4 || variableParts[0] != "tasks" || variableParts[2] != "results" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: tasks and results could be constants and shared with other result related logic (not part of this PR), can be done seperately

return &result, nil
}
}
return nil, fmt.Errorf("Could not find result with name %s for task run %s", reference.Result, reference.PipelineTask)
Copy link
Member

@pritidesai pritidesai Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is getting lost and not surfaced to user or logs, here is the chain of function calls: findTaskResultForPipelineResult <- resolveResultRefForPipelineResult <- extractResultRefsForPipelineResults <- convertPipelineResultToResultRefs

convertPipelineResultToResultRefs returns nil if error is not nil without logging it or returning the error along but not sure how the result reference as is makes it to the pipelineResults.

With, #3499, will able to pick up such errors and report them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is the reason that I ultimately decided the new taskResultValue() func shouldn't bother returning an error and just a pointer to the string value if there is one.

Copy link
Member

@pritidesai pritidesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a bunch @sbwsg for fixing this issue and appreciate all the cleanup along. It's much easier to understand now.

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2020
Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbwsg is it possible to update the release note to explain what the new expected behavior is? "Fix bug where PipelineRun emits task results that were never produced" <-- not clear to me what will happen now (from some discussions in the PR it sounds like this is not actually changing the functionality? but i think the update is that now task results that cannot be satisfied with be ""? Or won't be present at all?)

anyway in general it seems like this is better than the current state and it seems like a nice refactoring of the existing code - hopefully we can keep improving the error handling around results!

/approve

This will result in an `InvalidTaskResultReference` validation error during execution of the `Pipeline`.

**Note:** Since a `Pipeline Result` can contain references to multiple `Task Results`, if any of those
`Task Result` references are invalid the entire `Pipeline Result` is not emitted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think as a user i might find it hard to reason about what will happen based on this description, maybe it would be clearer to flip it around a bit, and i think it might be more clear to a user to say something like "task" or "pipeline task" instead of "task run":

PipelineRun execution will fail if a pipeline result requires any of the following:

  • A result from a Task that is skipped
  • A result from a Task that it declares but when executed it doesn't actually emit it. This should be considered a bug in the Task and may fail a TaskRun in future.
  • A result from a Task that it does not declare
  • A result from a Task that does not exit in the Pipeline

I didn't include these ones b/c i'm not sure if they WILL have that same error:

  • A TaskRun referenced by the Pipeline Result failed. The PipelineRun will also have failed. <-- since execution will stop in this case, i dont think there would be an error related to the result?

Copy link
Author

@ghost ghost Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've updated the section to reference "Pipeline Task" throughout instead of "TaskRun".

Inre: your comments on PipelineRun execution behaviour, I think there are at least two discrete questions we're trying to answer in this secion of the doc:

  1. What is the final "shape" of a pipeline result if it contains an invalid reference?
  2. What happens to pipelinerun execution when pipeline results contain these invalid references?

I think that (1) is answered by the existing PR. Personally I think that (2) is independent of the results and shouldn't be documented in a section titled Emitting Results from a Pipeline since it's orthogonal: The Pipeline Results don't have any bearing on the execution status of the PipelineRun. That doesn't mean we shouldn't document it but it does seem to me a little out-of-scope to try and document PipelineRun execution state in a section dedicated to result data.

PipelineRun execution will fail if a pipeline result requires any of the following:

  • A result from a Task that is skipped

PipelineRun execution doesn't fail when a Task is skipped and a result references it. This is kinda an example of why I'm a tad hesitant to include documentation about pipelinerun execution state in a section dedicated to result data. I think this will come back to bite us if we try and document it here. One day we'll change the behaviour of PipelineRuns and we'll have to remember that we documented pipelinerun execution in multiple places that aren't pipelineruns.md and then comb back through to make sure all those places reflect the changed behaviour. Personally I'd prefer that we keep the documentation in pipelines.md as tightly focused as possible on how Pipeline Results behave.

// list of PipelineResults, returning the computed set of PipelineRunResults. References to
// non-existent TaskResults or failed TaskRuns result in a PipelineResult being considered invalid
// and omitted from the returned slice. A nil slice is returned if no results are passed in or all
// results are invalid.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i apologize for probably having already asked this, but it seems weird to me that we just ignore the invalid values, feels like we should be returning an error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's quite right to say that we ignore the invalid values. We return a nil pointer if the value is invalid - in other words, no valid value was found for the result reference.

I don't totally agree that this is an error state but I don't feel strongly enough to dispute it. What should we do with that error when it's returned? In the old code it was silently ignored.

Copy link
Member

@pritidesai pritidesai Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like we should be returning an error?

Returning an error might break pipelineRuns. There was no error reported in logs or to users at all. If we introduce error now, pipelineRuns which used to be successful will start failing with this error. And at the same time, the least we could do is log the error if needed but still exit with success.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually move the execution of this logic to before the pipelinerun state is set.
That way could still fail the pipeline if results are broken or invalid in some way.
Maybe something for next release.

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish, pritidesai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bobcatfish,pritidesai]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ghost
Copy link
Author

ghost commented Nov 30, 2020

Updated release note to specify the old and new behaviour.

@ghost
Copy link
Author

ghost commented Nov 30, 2020

but i think the update is that now task results that cannot be satisfied with be ""? Or won't be present at all?)

Yup, that's exactly right. Before the result would be emitted by the Pipeline with "$(task.foo.results.bar)" left in it unreplaced.

@ghost
Copy link
Author

ghost commented Nov 30, 2020

Just to make things a tad more concrete, here's the yaml from a PipelineRun of a pipeline result that references a skipped task result:

  status:
    completionTime: "2020-11-30T15:55:17Z"
    conditions:
    - lastTransitionTime: "2020-11-30T15:55:17Z"
      message: 'Tasks Completed: 3 (Failed: 0, Cancelled 0), Skipped: 3'
      reason: Completed
      status: "True"
      type: Succeeded
    pipelineResults:
    - name: bar
      value: $(tasks.task-should-be-skipped-1.results.bar)

Notice that: 1) the PipelineRun is in a Succeeded/True condition and 2) the pipelineResults have the variable syntax printed verbatim because the referenced task was skipped. (2) is the thing we're actually attempting to fix here. We don't want variable syntax showing up in the pipeline results. Instead we don't want the pipeline result to be emitted at all.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 84.4% 83.8% -0.6
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 92.6% 92.6% -0.0

@ghost ghost added this to the Pipelines v0.19 milestone Nov 30, 2020
@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2020
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.1% -0.6
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 91.0% 73.4% -17.5

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.1% -0.6
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 91.0% 77.0% -13.9

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.1% -0.6
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 91.0% 81.7% -9.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.1% -0.6
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 91.0% 91.3% 0.3

Prior to this commit a Pipeline Result that references a Task Result of a failed TaskRun
would end up with a value of the literal variable. For example: if Task "foo" failed,
a Pipeline Result with value of "$(tasks.foo.results.bar)" would end up in the
PipelineRun.Status.Results list with the literal value, "$(tasks.foo.results.bar)".
It was therefore difficult to assess programatically whether results were populated
correctly or were the result of some invalid TaskRun condition.

This commit fixes the bug by filtering out PipelineRun Results that reference failed
TaskRuns. It was quite difficult to follow the flow of execution wrt PipelineRun Results
and so ultimately I had to refactor the whole lot to figure out where the bug was. The
final code is quite a bit shorter than the original and has improved test coverage to
more robustly exercise the behaviour of PipelineResults in various failure scenarios.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 82.7% 82.0% -0.8
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 91.0% 91.3% 0.3

@afrittoli
Copy link
Member

/test pull-tekton-pipeline-integration-tests

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, it is really good step forward.
And thanks for moving a lot of the results code out of pipelinerun.go!!
/lgtm

// list of PipelineResults, returning the computed set of PipelineRunResults. References to
// non-existent TaskResults or failed TaskRuns result in a PipelineResult being considered invalid
// and omitted from the returned slice. A nil slice is returned if no results are passed in or all
// results are invalid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually move the execution of this logic to before the pipelinerun state is set.
That way could still fail the pipeline if results are broken or invalid in some way.
Maybe something for next release.

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2021
@ghost
Copy link
Author

ghost commented Jan 12, 2021

Looks like integration failed due to timeouts when communicating with the api server

/test pull-tekton-pipeline-integration-tests

@ghost
Copy link
Author

ghost commented Jan 12, 2021

/test pull-tekton-pipeline-integration-tests

@ghost
Copy link
Author

ghost commented Jan 12, 2021

The integration job was marked as failed while the build log said it was still running. Not entirely sure what's up so tried again.

@ghost
Copy link
Author

ghost commented Jan 12, 2021

2021/01/12 17:45:35 process.go:153: Running: gcloud compute networks create tpipeline-e2e-net1349049644675502080 --project=tekton-prow-14 --subnet-mode=auto
ERROR: (gcloud.compute.networks.create) Could not fetch resource:
 - Internal error. Please try again or contact Google Support. (Code: '5B8B7971BCA3B.AD38666.49063965')

Hmm. Might just need to sit and wait a bit for things to settle before rerunning.

@ghost
Copy link
Author

ghost commented Jan 12, 2021

/test pull-tekton-pipeline-integration-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PipelineRun emitting task results that were never produced
6 participants