Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun.Status.TaskSpec is sometimes overwritten without replacements applied #5574

Closed
abayer opened this issue Sep 28, 2022 · 6 comments · Fixed by #5576
Closed

TaskRun.Status.TaskSpec is sometimes overwritten without replacements applied #5574

abayer opened this issue Sep 28, 2022 · 6 comments · Fixed by #5576
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@abayer
Copy link
Contributor

abayer commented Sep 28, 2022

Discovered this due to TestPipelineRunStatusSpec/pipeline_status_spec_updated e2e test being flaky:

init_test.go:138: Create namespace arendelle-t9lb4 to deploy to
    init_test.go:166: Verify SA "default" is created in namespace "arendelle-t9lb4"
    pipelinerun_test.go:110: Setting up test resources for "pipeline status spec updated" test in namespace arendelle-t9lb4
    pipelinerun_test.go:120: Waiting for PipelineRun pipeline-task-update in namespace arendelle-t9lb4 to complete
    pipelinerun_test.go:124: Making sure the expected TaskRuns [task1] were created
    pipelinerun_test.go:164: Checking if parameter replacements have been updated in the spec.
    pipelinerun_test.go:171: Expected replaced parameter value : Hello World! in Script: #!/usr/bin/env bash
        echo "$(params.HELLO)"
         But not found

Seen on https://prow.tekton.dev/view/gs/tekton-prow/pr-logs/pull/tektoncd_pipeline/5573/pull-tekton-pipeline-alpha-integration-tests/1575135529287028736 among others.

/kind bug

@tekton-robot tekton-robot added the kind/flake Categorizes issue or PR as related to a flakey test label Sep 28, 2022
@abayer
Copy link
Contributor Author

abayer commented Sep 28, 2022

So the resulting pod has the right script - the parameter replacement is performed there. But for some reason I have yet to figure out, the updated TaskSpec used to create the pod is not also being saved to tr.Status.TaskSpec after the replacement has happened.

@abayer
Copy link
Contributor Author

abayer commented Sep 28, 2022

Ok, some hacky debug logging later, it turns out that tr.Status.TaskSpec.Steps[0].Script is getting updated properly, but we only do the applyParamsContextsResultsAndWorkspaces when there isn't already a pod, and are (somehow, still not exactly sure how) overwriting tr.Status.TaskSpec with every reconcile, so that the original TaskSpec keeps getting put back.

@abayer
Copy link
Contributor Author

abayer commented Sep 28, 2022

Yup, more debug logging discoveries: sometimes, the first call to taskrun.go's reconcile after the tr.Status.TaskSpec has been updated sees a nil tr.Status.TaskSpec. When that's the case, we end up with the parameter value unreplaced in tr.Status.TaskSpec.Steps[0].Script. So what's making that be nil?

The only thing I've found so far that differs is that when we end up with a nil tr.Status.TaskSpec, the pipelinerun has been reconciled between reconcile calls to the TaskRun, but I'm not sure if that's just coincidence on a small sample size.

@abayer
Copy link
Contributor Author

abayer commented Sep 28, 2022

Ok, that pattern seems to be consistent - if the PipelineRun is reconciled after the TaskRun reconciliation that creates the pod and updates tr.Status.TaskSpec, but before the next reconciliation of that TaskRun, tr.Status.TaskSpec is nil at the start of that second TaskRun reconciliation and the "original" TaskSpec is written to tr.Status.TaskSpec. That strongly suggests something in the PipelineRun reconciliation, at that specific time, is resulting in wiping tr.Status.TaskSpec...

@abayer
Copy link
Contributor Author

abayer commented Sep 28, 2022

EDIT: ok, major correction this time - I misread the logs. Nothing actually differs in terms of order of operation after all.

But here's the order anyway, for posterity's sake.

  • reconcile is called for the TaskRun is reconciled
  • A Pending event gets broadcast for the TaskRun
  • The generated reconciler.go code for the TaskRun calls updater.UpdateStatus with the correct tr.Status.TaskSpec
  • Found a TaskRun pipeline-task-update-task1 that was missing from the PipelineRun status shows up from the PipelineRun reconciler
  • The PipelineRun is reconciled
  • The TaskRun reconciler.go calls updater.UpdateStatus again, still with the correct tr.Status.TaskSpec
  • A Started event gets broadcast for the TaskRun
  • reconcile is called for the TaskRun again - when the test's failing, tr.Status.TaskSpec is now nil

@abayer
Copy link
Contributor Author

abayer commented Sep 28, 2022

Ok, pretty sure this is due to 9cf7590, thanks to @wlynch noticing a new chains failure (tektoncd/chains#577) between Pipeline v0.40.0 and v0.40.1 - that's the exact code I've been looking at for this flake, but I somehow didn't notice that it had just been moved. 🤦

@abayer abayer changed the title TestPipelineRunStatusSpec/pipeline_status_spec_updated e2e test is flaky TaskRun.Status.TaskSpec is sometimes overwritten without replacements applied Sep 28, 2022
@abayer abayer added kind/bug Categorizes issue or PR as related to a bug. and removed kind/flake Categorizes issue or PR as related to a flakey test labels Sep 28, 2022
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Sep 28, 2022
fixes tektoncd#5574

With tektoncd#5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation.

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Sep 28, 2022
fixes tektoncd#5574

With tektoncd#5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation.

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
abayer added a commit to abayer/tektoncd-pipeline that referenced this issue Sep 29, 2022
fixes tektoncd#5574

With tektoncd#5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation.

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
tekton-robot pushed a commit that referenced this issue Sep 29, 2022
fixes #5574

With #5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation.

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
tekton-robot pushed a commit to tekton-robot/pipeline that referenced this issue Sep 29, 2022
fixes tektoncd#5574

With tektoncd#5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation.

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
tekton-robot pushed a commit that referenced this issue Sep 30, 2022
fixes #5574

With #5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation.

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants