Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(retries) Design and implement retries #658

Merged
merged 1 commit into from
May 6, 2019

Conversation

joseblas
Copy link
Contributor

@joseblas joseblas commented Mar 22, 2019

closes #221

Changes

Adding retries to pipeline. Pipeline/Tasks will have retries that will be passed to the subsequent pipeline|task run.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide
for more details.

Release Notes

Adds the possibility to execute automated retries when something fails, could be applied to Pipelines or Tasks.

Adds support for specifying `retries` at the level of a `Task` within a `Pipeline`

@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2019
@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Mar 22, 2019
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 22, 2019
@abayer
Copy link
Contributor

abayer commented Mar 22, 2019

/ok-to-test

@tekton-robot tekton-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 22, 2019
@bobcatfish
Copy link
Collaborator

Hey @joseblas !

So to resurrect our conversation from #510, the design that makes the most sense to me given our requirements is still having retries on Pipeline.Spec.Tasks, e.g.:

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
..
  tasks:
  - name: build-skaffold-web
    retries: 3
 ...
  - name: build-skaffold-app
    retries: 3
...
  - name: deploy-app
  ...
  - name: deploy-web

It doesn't seem to me like we need to have retries in the Run specs, at least initially (unless there is a use case I am completely missing where we need to override this at runtime?)

I thought about static/dynamic information, so the number of retries to be done should be in the specs in the runs info and the number of retries done within the Status. (From #510)

We are definitely on the same page that the number of retries should be in the spec of the CRD, and the status should keep track of how many have been attempted, the big difference is that I think it makes more sense (at least initially) to put this information into the Pipeline spec and not into PipelineRun or TaskRun.

If you want to discuss more there is time in our Tuesday 9am PST working group meeting (https://github.com/tektoncd/pipeline/blob/master/CONTRIBUTING.md#contact) for design discussions now so we could keep the discussion going there if you want to speed it up :)

@joseblas
Copy link
Contributor Author

Hi @bobcatfish,
following up that, retries will be applied only to tasks, right?

@bobcatfish
Copy link
Collaborator

Quick update: @joseblas and I discussed further offline :)

@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 2, 2019
@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 4, 2019
@joseblas joseblas force-pushed the retries-221 branch 2 times, most recently from 12f9b54 to 5c45e67 Compare April 4, 2019 17:03
@joseblas joseblas changed the title WIP feat(retries) Design and implement retries feat(retries) Design and implement retries Apr 5, 2019
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2019
@abayer
Copy link
Contributor

abayer commented Apr 5, 2019

/test pull-tekton-pipeline-integration-tests

2 similar comments
@joseblas
Copy link
Contributor Author

joseblas commented Apr 5, 2019

/test pull-tekton-pipeline-integration-tests

@abayer
Copy link
Contributor

abayer commented Apr 8, 2019

/test pull-tekton-pipeline-integration-tests

@abayer
Copy link
Contributor

abayer commented Apr 8, 2019

@joseblas suggests we may want to hold off on merging this until #731 is resolved - timeouts are generally flaky, basically.

@abayer
Copy link
Contributor

abayer commented Apr 8, 2019

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2019
Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really coming along! The interface is looking great :D

I have a lot of miscellaneous comments. One high level thing I noticed is that I don't see any logic handling cancellations - if a taskrun is cancelled we shouldn't retry it

(note i didnt look at TestReconcileWithTimeoutAndRetry yet - i have a feeling we need to add more test cases at a lower level and possibly a couple at that level as well)

@@ -140,6 +140,16 @@ tasks:
name: build-push
```

There is an optional attribute called `retries`, which declares how many times that task should be retries in case of failure,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a section and add a link to the "optional" section at the top of the doc re. attributes of tasks?

e.g. in the section at the top:

- Optional:
  - [`resources`](#declared-resources) - Specifies which
    [`PipelineResources`](resources.md) of which types the `Pipeline` will be
    using in its [Tasks](#pipeline-tasks)
  - `tasks`
    - `resources.inputs` / `resource.outputs`
      - [`from`](#from) - Used when the content of the
        [`PipelineResource`](resources.md) should come from the
        [output](tasks.md#output) of a previous [Pipeline Task](#pipeline-tasks)
      - [`runAfter`](#runAfter) - Used when the [Pipeline Task](#pipeline-task)
        should be executed after another Pipeline Task, but there is no
        [output linking](#from) required
    - [`retries`](#retries)

And here:

#### Retries

An optional attribute....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

docs/pipelines.md Outdated Show resolved Hide resolved
@@ -73,6 +73,10 @@ type PipelineTask struct {
Name string `json:"name"`
TaskRef TaskRef `json:"taskRef"`

// Number of retries to be run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment could also use some more detail re. what it means to retry and under what circumstances that will happen

pkg/apis/pipeline/v1alpha1/taskrun_types.go Outdated Show resolved Hide resolved
test/pipelinerun_test.go Outdated Show resolved Hide resolved
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 24, 2019
@joseblas
Copy link
Contributor Author

/test pull-tekton-pipeline-integration-tests

@joseblas joseblas force-pushed the retries-221 branch 2 times, most recently from bf8fbe8 to 17023bb Compare April 29, 2019 17:05
@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 29, 2019
@joseblas
Copy link
Contributor Author

Hi @bobcatfish, I've pushed some code according your comments. I did do a comment earlier but I deleted it as there was an error in the code. Sorry for that.

@abayer
Copy link
Contributor

abayer commented Apr 30, 2019

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2019
pkg/reconciler/v1alpha1/pipelinerun/pipelinerun.go Outdated Show resolved Hide resolved
docs/pipelines.md Show resolved Hide resolved
docs/pipelines.md Show resolved Hide resolved
pkg/reconciler/v1alpha1/pipelinerun/pipelinerun.go Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented May 6, 2019

Chatted offline about all of the above comments. Everything looks good👍

@abayer
Copy link
Contributor

abayer commented May 6, 2019

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2019
@abayer
Copy link
Contributor

abayer commented May 6, 2019

oh wait yeah
/approve

@abayer
Copy link
Contributor

abayer commented May 6, 2019

ugh, I don't have approve rights... @bobcatfish, halp? =)

@bobcatfish
Copy link
Collaborator

/approve

it's happening!!! Thanks for your hard work on this and patience @joseblas 🎉 ❤️

/meow space

@tekton-robot
Copy link
Collaborator

@bobcatfish: cat image

In response to this:

/approve

it's happening!!! Thanks for your hard work on this and patience @joseblas 🎉 ❤️

/meow space

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abayer, bobcatfish, joseblas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2019
@tekton-robot tekton-robot merged commit 295bf9c into tektoncd:master May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Design and implement retries
5 participants