Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun retries not working #1944

Closed
steveodonovan opened this issue Jan 27, 2020 · 6 comments · Fixed by #1971
Closed

TaskRun retries not working #1944

steveodonovan opened this issue Jan 27, 2020 · 6 comments · Fixed by #1971
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@steveodonovan
Copy link
Member

steveodonovan commented Jan 27, 2020

Expected Behavior

Pipelines with tasks that have retries set should rerun a taskRun where it fails.

Actual Behavior

It does not rerun

Steps to Reproduce the Problem

  1. Run the below pipeline run,
  2. watch the resources which complete too quickly to have run the 30 second sleep

Additional Info

Below are the sample resources, a watch of the task run and the taskRun after its completed.

Note the tasks complete too quickly to have actually executed and the retried status entries are all duplicates. On previous versions I'm seeing this work as expected with a different pod per task retry. I'm not seeing any issue or pr to explicitly change this behaviour around a pod per retry which implies it might be accidental.

Sample resources to reproduce

apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: task1
spec:
  steps:
    - name: task-one-step-one
      image: ubuntu
      command: ["/bin/bash"]
      args: ['-c', 'for i in {1..2}; do echo GONLOGGIN 1 && sleep 1; done;'] 
---
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: task2
spec:
  steps:
    - name: task-one-step-one
      image: ubuntu
      command: ["/bin/bash"]
      args: ['-c', 'sleep 30'] 
    - name: task-one-step-two
      image: ubuntu
      command: ['bash']
      args: ['-c', 'return 1;'] 
---
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: 999-working-logs-long
spec:
  tasks:
  - name: task1
    taskRef:
      name: task1
  - name: task2
    retries: 8
    taskRef:
      name: task2
---
apiVersion: tekton.dev/v1alpha1
kind: PipelineRun
metadata:
  name: sample-piperun-3
spec:
  pipelineRef:
    name: 999-working-logs-long

Watching

stephen@stephens-mbp tektonResources minikube  $ k get taskRuns --watch
NAME                                 SUCCEEDED   REASON      STARTTIME   COMPLETIONTIME
cutting-down-trees-run-task1-zqqlh   True        Succeeded   37m         36m
cutting-down-trees-run-task2-2q8fw   False       Failed      36m         36m
sample-piperun-1-task1-g9dqp         True        Succeeded   2m59s       2m42s
sample-piperun-1-task2-g2ntp         False       Failed      2m16s       2m15s
sample-piperun-2-task1-dnjtv         True        Succeeded   2m6s        114s
sample-piperun-2-task2-qcfzx         False       Failed      75s         75s
sample-piperun-3-task1-t9gfh         Unknown     Pending     5s          
sample-piperun-3-task2-zfpzn         Unknown     Pending     5s          
sample-piperun-task1-sw76k           True        Succeeded   33m         32m
sample-piperun-task2-lghp8           False       Failed      32m         32m





sample-piperun-3-task1-t9gfh         Unknown     Running     13s         
sample-piperun-3-task1-t9gfh         True        Succeeded   17s         0s
sample-piperun-3-task2-zfpzn         Unknown     Running     17s         
sample-piperun-3-task2-zfpzn         False       Failed      49s         0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      1s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      0s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      0s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      0s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      0s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      0s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      1s          0s
sample-piperun-3-task2-zfpzn         Unknown                             
sample-piperun-3-task2-zfpzn         False       Failed      0s          0s

Retried taskRun

Name:         sample-piperun-3-task2-zfpzn
Namespace:    default
Labels:       app.kubernetes.io/managed-by=tekton-pipelines
              tekton.dev/pipeline=999-working-logs-long
              tekton.dev/pipelineRun=sample-piperun-3
              tekton.dev/pipelineTask=task2
              tekton.dev/task=task2
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"tekton.dev/v1alpha1","kind":"Task","metadata":{"annotations":{},"name":"task2","namespace":"default"},"spec":{"steps":[{"ar...
API Version:  tekton.dev/v1alpha1
Kind:         TaskRun
Metadata:
  Creation Timestamp:  2020-01-27T13:39:04Z
  Generation:          1
  Owner References:
    API Version:           tekton.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  PipelineRun
    Name:                  sample-piperun-3
    UID:                   267d99ff-0a2f-41d9-8f71-548123ca0355
  Resource Version:        111445
  Self Link:               /apis/tekton.dev/v1alpha1/namespaces/default/taskruns/sample-piperun-3-task2-zfpzn
  UID:                     afdaf65a-2fec-486f-b5d7-63f8758f4869
Spec:
  Inputs:
  Outputs:
  Service Account Name:  
  Task Ref:
    Kind:   Task
    Name:   task2
  Timeout:  1h0m0s
Status:
  Completion Time:  2020-01-27T13:39:55Z
  Conditions:
    Last Transition Time:  2020-01-27T13:39:55Z
    Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
    Reason:                Failed
    Status:                False
    Type:                  Succeeded
  Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
  Retries Status:
    Completion Time:  2020-01-27T13:39:53Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:53Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:04Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:54Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:54Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:53Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:54Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:54Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:54Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:54Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:54Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:54Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:54Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:54Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:54Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:54Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:54Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:54Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:54Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:54Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:54Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
    Completion Time:   2020-01-27T13:39:55Z
    Conditions:
      Last Transition Time:  2020-01-27T13:39:55Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-3-task2-zfpzn-pod-glmb2
    Start Time:              2020-01-27T13:39:54Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
        Exit Code:     0
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Completed
        Started At:    2020-01-27T13:39:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
        Exit Code:     1
        Finished At:   2020-01-27T13:39:53Z
        Reason:        Error
        Started At:    2020-01-27T13:39:53Z
  Start Time:          2020-01-27T13:39:55Z
  Steps:
    Container:  step-task-one-step-one
    Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
    Name:       task-one-step-one
    Terminated:
      Container ID:  docker://23bb2bbd4f8fed6a06709fc8fce317e4a9437735b619233c1ef8e4234ab9f7fb
      Exit Code:     0
      Finished At:   2020-01-27T13:39:53Z
      Reason:        Completed
      Started At:    2020-01-27T13:39:23Z
    Container:       step-task-one-step-two
    Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
    Name:            task-one-step-two
    Terminated:
      Container ID:  docker://439d9bc35fa764ffcc7289b222c93d459a0303b4a1cd4f50759f09d5f959b50c
      Exit Code:     1
      Finished At:   2020-01-27T13:39:53Z
      Reason:        Error
      Started At:    2020-01-27T13:39:53Z
Events:
  Type     Reason  Age                    From                Message
  ----     ------  ----                   ----                -------
  Warning  Failed  4m13s (x9 over 4m15s)  taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-3-task2-zfpzn-pod-glmb2 -c step-task-one-step-two
@vdemeester
Copy link
Member

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 27, 2020
@vdemeester vdemeester added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jan 27, 2020
@vdemeester vdemeester added this to the Pipelines 0.10.1 🐱 milestone Jan 27, 2020
@steveodonovan
Copy link
Member Author

0.9.2 behaviour

The watch

sample-piperun-on-zero-nine-task1-rdzjm   Unknown     Pending     7s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     8s          
sample-piperun-on-zero-nine-task1-rdzjm   Unknown     Running     10s         
sample-piperun-on-zero-nine-task1-rdzjm   True        Succeeded   15s         0s
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     19s         
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     51s         
sample-piperun-on-zero-nine-task2-27dbl   False       Failed      52s         0s
sample-piperun-on-zero-nine-task2-27dbl   Unknown                             
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     0s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     0s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     2s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     14s         
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     45s         
sample-piperun-on-zero-nine-task2-27dbl   False       Failed      46s         0s
sample-piperun-on-zero-nine-task2-27dbl   Unknown                             
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     1s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     1s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     3s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     8s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     40s         
sample-piperun-on-zero-nine-task2-27dbl   False       Failed      42s         1s
sample-piperun-on-zero-nine-task2-27dbl   Unknown                             
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     0s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     0s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     2s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     7s          
sample-piperun-on-zero-nine-task2-27dbl   False       Failed      40s         0s
sample-piperun-on-zero-nine-task2-27dbl   Unknown                             
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     1s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     1s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Pending     2s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     8s          
sample-piperun-on-zero-nine-task2-27dbl   Unknown     Running     39s         
sample-piperun-on-zero-nine-task2-27dbl   False       Failed      41s         1s

Pods being spun for each retry

default            sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79          0/2     Init:0/1    0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79          0/2     PodInitializing   0          2s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79          2/2     Running           0          14s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79          2/2     Running           0          14s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79          1/2     Running           0          45s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79          0/2     Completed         0          46s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          0/2     Pending           0          1s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          0/2     Pending           0          1s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          0/2     Init:0/1          0          1s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          0/2     PodInitializing   0          3s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          2/2     Running           0          8s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          2/2     Running           0          8s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          1/2     Running           0          40s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be          0/2     Completed         0          41s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          0/2     Pending           0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          0/2     Pending           0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          0/2     Init:0/1          0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          0/2     PodInitializing   0          2s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          2/2     Running           0          7s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          2/2     Running           0          7s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348          0/2     Completed         0          40s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          0/2     Pending           0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          0/2     Pending           0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          0/2     Init:0/1          0          1s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          0/2     PodInitializing   0          2s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          2/2     Running           0          8s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          2/2     Running           0          8s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          1/2     Running           0          39s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45          0/2     Completed         0          40s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e          0/2     Pending           0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e          0/2     Pending           0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e          0/2     Init:0/1          0          0s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e          0/2     PodInitializing   0          2s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e          2/2     Running           0          10s
default            sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e          2/2     Running           0          10s

The taskRun

ame:         sample-piperun-on-zero-nine-task2-27dbl
Namespace:    default
Labels:       tekton.dev/pipeline=999-working-logs-long
              tekton.dev/pipelineRun=sample-piperun-on-zero-nine
              tekton.dev/pipelineTask=task2
              tekton.dev/task=task2
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"tekton.dev/v1alpha1","kind":"Task","metadata":{"annotations":{},"name":"task2","namespace":"default"},"spec":{"steps":[{"ar...
API Version:  tekton.dev/v1alpha1
Kind:         TaskRun
Metadata:
  Creation Timestamp:  2020-01-27T13:56:59Z
  Generation:          1
  Owner References:
    API Version:           tekton.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  PipelineRun
    Name:                  sample-piperun-on-zero-nine
    UID:                   e8eb77dc-f096-467f-b5ce-b0eff574c15c
  Resource Version:        113735
  Self Link:               /apis/tekton.dev/v1alpha1/namespaces/default/taskruns/sample-piperun-on-zero-nine-task2-27dbl
  UID:                     1321e98e-807a-4b5d-a3fc-cc8a32b90cf9
Spec:
  Inputs:
  Outputs:
  Pod Template:
  Service Account Name:  
  Task Ref:
    Kind:   Task
    Name:   task2
  Timeout:  1h0m0s
Status:
  Conditions:
    Last Transition Time:  2020-01-27T14:03:07Z
    Message:               Not all Steps in the Task have finished executing
    Reason:                Running
    Status:                Unknown
    Type:                  Succeeded
  Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-8f66c7
  Retries Status:
    Completion Time:  2020-01-27T13:57:51Z
    Conditions:
      Last Transition Time:  2020-01-27T13:57:51Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-b07a14 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-b07a14
    Start Time:              2020-01-27T13:56:59Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://1b74b1d724280923f5f192b79a4ad59bfb5084aa1934cac92513b1af18abb577
        Exit Code:     0
        Finished At:   2020-01-27T13:57:49Z
        Reason:        Completed
        Started At:    2020-01-27T13:57:14Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://f61e93fb1b44a2440f7594d77785015a69919c66f8c36d9072c2b2e92694b6cc
        Exit Code:     1
        Finished At:   2020-01-27T13:57:50Z
        Reason:        Error
        Started At:    2020-01-27T13:57:17Z
    Completion Time:   2020-01-27T13:58:37Z
    Conditions:
      Last Transition Time:  2020-01-27T13:58:37Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79
    Start Time:              2020-01-27T13:57:51Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://5311e69d1bbec8002609ad29037c09aad08db7d0800a494e38b79c963c6f3b66
        Exit Code:     0
        Finished At:   2020-01-27T13:58:36Z
        Reason:        Completed
        Started At:    2020-01-27T13:57:59Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://136732492b24c1bb07c40599f58f1841a489e5f8778978ff88453e45f0c4aea3
        Exit Code:     1
        Finished At:   2020-01-27T13:58:37Z
        Reason:        Error
        Started At:    2020-01-27T13:58:05Z
    Completion Time:   2020-01-27T13:59:18Z
    Conditions:
      Last Transition Time:  2020-01-27T13:59:18Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be
    Start Time:              2020-01-27T13:58:37Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://46885efb1b6fe860c70cde5624ffb0b92c3bccba52d22edc4a52f362c0b20040
        Exit Code:     0
        Finished At:   2020-01-27T13:59:17Z
        Reason:        Completed
        Started At:    2020-01-27T13:58:42Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://26c4e84314464e697f1b637ba3cf5105f45ef0d3d0d2d722427fe9e01dc48baa
        Exit Code:     1
        Finished At:   2020-01-27T13:59:17Z
        Reason:        Error
        Started At:    2020-01-27T13:58:44Z
    Completion Time:   2020-01-27T13:59:59Z
    Conditions:
      Last Transition Time:  2020-01-27T13:59:59Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348
    Start Time:              2020-01-27T13:59:19Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://13dcccf053415f50c2658f32977441e6ca4993d26eb1cd2764f587b812424e4d
        Exit Code:     0
        Finished At:   2020-01-27T13:59:58Z
        Reason:        Completed
        Started At:    2020-01-27T13:59:23Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://12c1d41c44ad1452a7a0447fde276389e9666eb6fe3615562ff8629fc8287fcb
        Exit Code:     1
        Finished At:   2020-01-27T13:59:58Z
        Reason:        Error
        Started At:    2020-01-27T13:59:25Z
    Completion Time:   2020-01-27T14:00:39Z
    Conditions:
      Last Transition Time:  2020-01-27T14:00:39Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45
    Start Time:              2020-01-27T13:59:59Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://f5446862d8569e3ed196cccbb81ed914e81583212ada93f93ce86ab9231eaad9
        Exit Code:     0
        Finished At:   2020-01-27T14:00:38Z
        Reason:        Completed
        Started At:    2020-01-27T14:00:04Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://28f25ad9ca86a16ac07df796c97edacfb7b3382753d94f88a7ca0cf641c7b548
        Exit Code:     1
        Finished At:   2020-01-27T14:00:38Z
        Reason:        Error
        Started At:    2020-01-27T14:00:06Z
    Completion Time:   2020-01-27T14:01:22Z
    Conditions:
      Last Transition Time:  2020-01-27T14:01:22Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e
    Start Time:              2020-01-27T14:00:40Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://50ba929c6ca8e7138730f8ae8ae98a96f598129cdbc4b843ac87e3426f8beac1
        Exit Code:     0
        Finished At:   2020-01-27T14:01:21Z
        Reason:        Completed
        Started At:    2020-01-27T14:00:46Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://3a3edec5d11790be605bbb14ba79fc0cec53b3c14f245436d3236cadd4e0857a
        Exit Code:     1
        Finished At:   2020-01-27T14:01:21Z
        Reason:        Error
        Started At:    2020-01-27T14:00:50Z
    Completion Time:   2020-01-27T14:02:05Z
    Conditions:
      Last Transition Time:  2020-01-27T14:02:05Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-fdf4b8 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-fdf4b8
    Start Time:              2020-01-27T14:01:22Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://bcd1f4e2e86f070abe6da5f9f4bc0aaca0d2017247f34fe960feff855601369a
        Exit Code:     0
        Finished At:   2020-01-27T14:02:04Z
        Reason:        Completed
        Started At:    2020-01-27T14:01:28Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://2259d58133341a81ec0268ab1a81b7123f5d5cab38aea190337cd1314eda2be0
        Exit Code:     1
        Finished At:   2020-01-27T14:02:04Z
        Reason:        Error
        Started At:    2020-01-27T14:01:31Z
    Completion Time:   2020-01-27T14:02:53Z
    Conditions:
      Last Transition Time:  2020-01-27T14:02:53Z
      Message:               "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-5fe409 -c step-task-one-step-two
      Reason:                Failed
      Status:                False
      Type:                  Succeeded
    Pod Name:                sample-piperun-on-zero-nine-task2-27dbl-pod-5fe409
    Start Time:              2020-01-27T14:02:05Z
    Steps:
      Container:  step-task-one-step-one
      Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:       task-one-step-one
      Terminated:
        Container ID:  docker://cb3e0d185235613ad4155fbc4addb611e4757bb6ec7554f3a6d63ac4eb584e20
        Exit Code:     0
        Finished At:   2020-01-27T14:02:52Z
        Reason:        Completed
        Started At:    2020-01-27T14:02:14Z
      Container:       step-task-one-step-two
      Image ID:        docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
      Name:            task-one-step-two
      Terminated:
        Container ID:  docker://fedc8c07000cdaf4982f60bd352981de408dbb352969af0c39f74e7fe70f279f
        Exit Code:     1
        Finished At:   2020-01-27T14:02:53Z
        Reason:        Error
        Started At:    2020-01-27T14:02:20Z
  Start Time:          2020-01-27T14:02:53Z
  Steps:
    Container:  step-task-one-step-one
    Image ID:   docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
    Name:       task-one-step-one
    Running:
      Started At:  2020-01-27T14:03:01Z
    Container:     step-task-one-step-two
    Image ID:      docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110
    Name:          task-one-step-two
    Running:
      Started At:  2020-01-27T14:03:06Z
Events:
  Type     Reason  Age    From                Message
  ----     ------  ----   ----                -------
  Warning  Failed  5m28s  taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-b07a14 -c step-task-one-step-two
  Warning  Failed  4m42s  taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-79cb79 -c step-task-one-step-two
  Warning  Failed  4m1s   taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-75e5be -c step-task-one-step-two
  Warning  Failed  3m20s  taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-ef7348 -c step-task-one-step-two
  Warning  Failed  2m40s  taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-77ec45 -c step-task-one-step-two
  Warning  Failed  117s   taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-50d94e -c step-task-one-step-two
  Warning  Failed  74s    taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-fdf4b8 -c step-task-one-step-two
  Warning  Failed  26s    taskrun-controller  "step-task-one-step-two" exited with code 1 (image: "docker-pullable://ubuntu@sha256:8d31dad0c58f552e890d68bbfb735588b6b820a46e459672d96e585871acc110"); for logs run: kubectl -n default logs sample-piperun-on-zero-nine-task2-27dbl-pod-5fe409 -c step-task-one-step-two

@steveodonovan
Copy link
Member Author

steveodonovan commented Jan 27, 2020

@vdemeester FYI
0f20c35#diff-14481606a469dcd50f6cb9f5540e3b06R543 and 0f20c35#diff-14481606a469dcd50f6cb9f5540e3b06L322

Looks like its the same issue I described in slack, the assumption of a 1:1 mapping between a taskRun and a pod. The retry pod will never be created. Not sure how to fix given we have the same issue around not being able to uniquely identify a pod which failed/is retrying or has retried, could do a count for the pods but then once completed we wont know which status to attach to the task run

@vdemeester
Copy link
Member

vdemeester commented Jan 29, 2020

Something is weird though, I thought the retries in pipeline would generate new TaskRuns instead of re-using the same… It is interesting that it does not 😅

Reusing the same TaskRun is weird, as this means we are doing the following:

  1. Create TaskRun
  2. TaskRun Fails
  3. Update the TaskRun status to force the reconciler re-running it

This.. well… doesn't work well as of now as the pod being already there… it fails instantly (where it would re-create it before for some odd reason – well not so odd podName gets removed)

@vdemeester
Copy link
Member

  • The way retries are implemented make the assumption that a TaskRun can spin multiple Pod (depending on what we do). In a gist, if a TaskRun fails but had a number of retries in the Pipeline, it clear the status (removing lots of things, and podName) and makes the assumption that the TaskRun reconciler will re-create a Pod.
  • The way TaskRun works (as of 0.10.1) makes a TaskRun able to have only 1 Pod and will not re-create one, ever (except if you delete the Pod and clear the status)

This puts this fix in a weird state (and it makes it a huge fix to do as far as I can see)

  1. Either we make the assumption that there is a 1-1 relation between TaskRun and Pod, and then we need to rewrite how retries work. We can add a little something in the TaskRun to make it know how much it should retry (more or less moving the retries feature to the TaskRun reconciler), and letting it handle that or we create a TaskRun for each retry (and let the feature on the PipelineRun reconciler).
  2. Or, on retries, we delete the Pod, clear the status and let the TaskRun reconciler do its things — this is the worse, as we litteraly are removing Pod (and thus logs of the failure), I can't see a good reason to do that

/cc @steveodonovan @bobcatfish @imjasonh

@afrittoli
Copy link
Member

Clearing the status and deleting the pod does not sound like something we want to do. at least not until we have a way of storing status and log somewhere else.

If I understood correctly the change that broke things is the fact that the reconciler does not rely for podName from the taskRun anymore. My suggestion would be to revert that change for now to fix this. We can then take time in 0.11 to design a new way for retries to work together with #1709.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
No open projects
Tekton Pipelines
  
Closed
4 participants