Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic when submitting TaskRun #2220

Closed
poy opened this issue Mar 12, 2020 · 6 comments · Fixed by #2222
Closed

panic when submitting TaskRun #2220

poy opened this issue Mar 12, 2020 · 6 comments · Fixed by #2222
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@poy
Copy link
Contributor

poy commented Mar 12, 2020

Expected Behavior

TaskRun to complete successfully

Actual Behavior

Controller panics

Steps to Reproduce the Problem

  1. This happens when I create a TaskRun that references a ClusterTask.

Additional Info

  • Kubernetes version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.17", GitCommit:"bdceba0734835c6cb1acbd1c447caf17d8613b44", GitTreeState:"clean", BuildDate:"2020-01-17T23:10:13Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

v0.10.1

The panic from the controller logs:

panic: runtime error: index out of range [5] with length 5

goroutine 258 [running]:
github.com/tektoncd/pipeline/pkg/pod.(*stepStateSorter).changeIndex(...)
	github.com/tektoncd/pipeline/pkg/pod/status.go:386
github.com/tektoncd/pipeline/pkg/pod.(*stepStateSorter).Swap(0xc000caad40, 0x2, 0x1)
	github.com/tektoncd/pipeline/pkg/pod/status.go:393 +0x514
sort.insertionSort(0x1b7c1a0, 0xc000caad40, 0x0, 0x5)
	sort/sort.go:28 +0x57
sort.quickSort(0x1b7c1a0, 0xc000caad40, 0x0, 0x5, 0x6)
	sort/sort.go:209 +0x201
sort.Sort(0x1b7c1a0, 0xc000caad40)
	sort/sort.go:218 +0x79
github.com/tektoncd/pipeline/pkg/pod.sortTaskRunStepOrder(0xc000fea240, 0x5, 0x8, 0xc000ab8000, 0x7, 0x8, 0x2, 0x0, 0x0)
	github.com/tektoncd/pipeline/pkg/pod/status.go:357 +0xb5
github.com/tektoncd/pipeline/pkg/pod.MakeTaskRunStatus(0x0, 0x0, 0x0, 0x0, 0xc000828630, 0x2c, 0x0, 0x0, 0xc0008286f0, 0x29, ...)
	github.com/tektoncd/pipeline/pkg/pod/status.go:177 +0x4f7
github.com/tektoncd/pipeline/pkg/reconciler/taskrun.(*Reconciler).reconcile(0xc000106750, 0x1b8e3a0, 0xc0001f01e0, 0xc000aadb80, 0xed5fc68ab, 0x27380c0)
	github.com/tektoncd/pipeline/pkg/reconciler/taskrun/taskrun.go:360 +0x115e
github.com/tektoncd/pipeline/pkg/reconciler/taskrun.(*Reconciler).Reconcile(0xc000106750, 0x1b8e3a0, 0xc0001f01e0, 0xc0009bfc80, 0x56, 0xc0000b9e00, 0x1b8e3a0)
	github.com/tektoncd/pipeline/pkg/reconciler/taskrun/taskrun.go:153 +0x841
knative.dev/pkg/controller.(*Impl).processNextWorkItem(0xc0002e4600, 0x0)
	knative.dev/pkg@v0.0.0-20191111150521-6d806b998379/controller/controller.go:335 +0x654
knative.dev/pkg/controller.(*Impl).Run.func1(0xc0009224f0, 0xc0002e4600)
	knative.dev/pkg@v0.0.0-20191111150521-6d806b998379/controller/controller.go:285 +0x53
created by knative.dev/pkg/controller.(*Impl).Run
	knative.dev/pkg@v0.0.0-20191111150521-6d806b998379/controller/controller.go:283 +0x1ac

I've managed to modify the ./pkg/pod/status_test.go file (TestSortTaskRunStepOrder) to recreate the error:

diff --git a/pkg/pod/status_test.go b/pkg/pod/status_test.go
index 9215e116..3f96a41a 100644
--- a/pkg/pod/status_test.go
+++ b/pkg/pod/status_test.go
@@ -635,6 +635,10 @@ func TestSidecarsReady(t *testing.T) {
 func TestSortTaskRunStepOrder(t *testing.T) {
 	steps := []v1alpha1.Step{{Container: corev1.Container{
 		Name: "hello",
+	}}, {Container: corev1.Container{
+		Name: "extra-1",
+	}}, {Container: corev1.Container{
+		Name: "extra-2",
 	}}, {Container: corev1.Container{
 		Name: "exit",
 	}}, {Container: corev1.Container{
go test ./pkg/pod --run TestSortTaskRunStepOrder -v
@project-bot project-bot bot added this to Needs triage in Tekton Pipelines Mar 12, 2020
@ghost ghost added kind/bug Categorizes issue or PR as related to a bug. needs-cherry-pick Indicates a PR needs to be cherry-pick to a release branch labels Mar 12, 2020
@ghost
Copy link

ghost commented Mar 12, 2020

I can reproduce with the modifications to the unit test you mentioned but I'm curious if you see this with all ClusterTasks? I'm not able to reproduce with the trivial ClusterTask example in examples/v1beta1/taskruns/clustertask.yaml.

I have a fix that seems to work ok for the unit test example but I'd really like to see an example of a ClusterTask / TaskRun that's hitting the same error so that I can confirm my fix works for that too. Can you nail down what specifically in the ClusterTask configuration is the source of the problem or provide your ClusterTask for me to test?

@ghost
Copy link

ghost commented Mar 12, 2020

(Even just copy/pasting the output of describe taskrun would be helpful - I'm trying to understand where the mismatch between number of steps and number of step states is coming from)

@poy
Copy link
Contributor Author

poy commented Mar 12, 2020

So it looks like the problem happens when a step fails.

@ghost
Copy link

ghost commented Mar 13, 2020

Woo this is a doozy.

The image digest exporter (part of the Image Output Resource) is configured with "terminationMessagePolicy": "FallbackToLogsOnError",.

When a previous step has failed in the Task our entrypoint wrapping the exporter emits the following log line: 2020/03/13 12:03:26 Skipping step because a previous step failed.

That line gets read by the Tekton controller, which is looking for JSON in the termination message. It fails to parse the message from the image digest exporter and stops trying to read step statuses (here).

That results in a mismatch in the length of the list of steps and the length of the list of step statuses and finally our sort method panics with an out of bounds error because it assumes the lengths of the two lists are the same.

I'm working on a couple of fixes for this and will make a PR later today.

@bobcatfish bobcatfish moved this from Needs triage to Low priority in Tekton Pipelines Mar 13, 2020
@bobcatfish bobcatfish moved this from Low priority to In Progress in Tekton Pipelines Mar 13, 2020
@ghost ghost removed the needs-cherry-pick Indicates a PR needs to be cherry-pick to a release branch label Mar 16, 2020
Tekton Pipelines automation moved this from In Progress to Closed Mar 17, 2020
@ghost
Copy link

ghost commented Mar 17, 2020

Reopening until this is backported to 0.10

@ghost ghost reopened this Mar 17, 2020
Tekton Pipelines automation moved this from Closed to Needs triage Mar 17, 2020
@ghost ghost mentioned this issue Mar 17, 2020
3 tasks
@dibyom dibyom moved this from Needs triage to High priority in Tekton Pipelines Mar 18, 2020
@dibyom dibyom moved this from High priority to Low priority in Tekton Pipelines Mar 18, 2020
@dibyom dibyom moved this from Low priority to In Progress in Tekton Pipelines Mar 18, 2020
@ghost
Copy link

ghost commented Mar 30, 2020

Marking as closed now that we've backported to 0.10 and released in 0.11.

@ghost ghost closed this as completed Mar 30, 2020
Tekton Pipelines automation moved this from In Progress to Closed Mar 30, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
No open projects
Tekton Pipelines
  
Closed
Development

Successfully merging a pull request may close this issue.

1 participant