Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun status wrong when unnamed step fails (followed by named step) #2415

Closed
AlanGreene opened this issue Apr 16, 2020 · 7 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@AlanGreene
Copy link
Member

Expected Behavior

TaskRun status should correctly identify the failed step.

Actual Behavior

When an unnamed step fails and it is followed by a named step, the TaskRun status incorrectly identifies the named step as the reason for the failure.

Steps to Reproduce the Problem

kubectl create -f - << EOF
---
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: test-unnamed-correct-
spec:
  taskSpec:
    steps:
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 0
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 1
        false
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 2
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 3
---
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: test-unnamed-wrong-
spec:
  taskSpec:
    steps:
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 0
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 1
        false
    - image: ubuntu
      name: named-step
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 2
    - image: ubuntu
      script: |
        #!/usr/bin/env bash
        sleep 3 && echo step 3
EOF

In both cases 'step 1' fails (expected).
The TaskRun status for test-unnamed-correct- identifies 'step 1' correctly (unnamed-1 in this case).
test-unnamed-wrong- incorrectly identifies 'step 2' ('named-step') as the culprit.

Status:
  Conditions:
    Message:               "step-named-step" exited with code 1 (image: "docker-pullable://ubuntu@sha256:bec5a2727be7fff3d308193cfde3491f8fba1a2ba392b7546b43a051853a341d"); for logs run: kubectl -n default logs test-unnamed-wrong-8l6sd-pod-dfsn6 -c step-named-step

Additional Info

  • Kubernetes version:

    Output of kubectl version:

    Client Version: v1.16.3
    Server Version: v1.15.5
    
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

    v0.11.1

@vdemeester
Copy link
Member

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 16, 2020
@vincent-pli
Copy link
Member

Checked the code, we sort the pod.Status.ContainerStatuses with FinishedAt, then get the first one whose exit code is not 0.

pipeline/pkg/pod/status.go

Lines 265 to 274 in 6b1579c

sort.Slice(podInstance.Status.ContainerStatuses, func(i, j int) bool {
var ifinish, jfinish time.Time
if term := podInstance.Status.ContainerStatuses[i].State.Terminated; term != nil {
ifinish = term.FinishedAt.Time
}
if term := podInstance.Status.ContainerStatuses[j].State.Terminated; term != nil {
jfinish = term.FinishedAt.Time
}
return ifinish.Before(jfinish)
})

Sometimes the FinishedAt are the same for several steps, then the sort will do nothing, that's means the container(step) with same FinishedAt will sort by its name. that's not expected.

Could we sort with step defined in task? @vdemeester

@AlanGreene
Copy link
Member Author

Sounds like there might be some overlap with #2416

I wonder is there a reason we're sorting on the finish time instead of start time? It looks like we don't have enough precision in the recorded times either way to rely on times for accurate sorting.

@vincent-pli
Copy link
Member

May be we can use FinishedAt and StartAt together.

@GregDritschler
Copy link
Contributor

Same discussion about this problem in #2029 but it ground to a halt looking for a solution. Using resolution of seconds (which is all that k8 api server has for finish time) doesn't work. Tekton is managing its own start times so that might work with a higher resolution.

@vincent-pli
Copy link
Member

@GregDritschler
I checked all comments in #2029, very helpful, thanks.
What do you think to introduce StartAt for the sorting when FinishedAt are exactly the same.
Since the goal is to find the first failed step, the StartAt and FinishedAt are most simple and directly solution.
Moreover, as you mentioned tekton control the StartAt, we can adopt a higher resolution one.

vincent-pli added a commit to vincent-pli/pipeline that referenced this issue Apr 21, 2020
Fix issue: tektoncd#2415

Introduce `StartAt` for the sorting when `FinishedAt` are exactly the same.
Since the goal is to find the first failed step, the StartAt and FinishedAt are most simple and directly solution.
Moreover, adopt a higher resolution format for `StartAt` to make it more accurately.
vincent-pli added a commit to vincent-pli/pipeline that referenced this issue May 21, 2020
Fix issue: tektoncd#2415

Introduce `StartAt` for the sorting when `FinishedAt` are exactly the same.
Since the goal is to find the first failed step, the StartAt and FinishedAt are most simple and directly solution.
Moreover, adopt a higher resolution format for `StartAt` to make it more accurately.
vincent-pli added a commit to vincent-pli/pipeline that referenced this issue May 21, 2020
Fix issue: tektoncd#2415

Introduce `StartAt` for the sorting when `FinishedAt` are exactly the same.
Since the goal is to find the first failed step, the StartAt and FinishedAt are most simple and directly solution.
Moreover, adopt a higher resolution format for `StartAt` to make it more accurately.
tekton-robot pushed a commit that referenced this issue May 21, 2020
Fix issue: #2415

Introduce `StartAt` for the sorting when `FinishedAt` are exactly the same.
Since the goal is to find the first failed step, the StartAt and FinishedAt are most simple and directly solution.
Moreover, adopt a higher resolution format for `StartAt` to make it more accurately.
@vdemeester
Copy link
Member

This is fixed by #2455, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants