Skip to content

Create 0013-graceful-sidecar-termination.md#191

Closed
shahnewazrifat wants to merge 3 commits intotektoncd:masterfrom
shahnewazrifat:patch-1
Closed

Create 0013-graceful-sidecar-termination.md#191
shahnewazrifat wants to merge 3 commits intotektoncd:masterfrom
shahnewazrifat:patch-1

Conversation

@shahnewazrifat
Copy link
Copy Markdown

TEP for introducing the control to enable graceful sidecar termination (skipping Nop Image Swap)

TEP for introducing the control to enable graceful sidecar termination (skipping Nop Image Swap)
@tekton-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign afrittoli
You can assign the PR to them by writing /assign @afrittoli in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 31, 2020
@bobcatfish bobcatfish added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 31, 2020
## Summary

Currently, the sidecar feature in a tekton pipeline/taskrun is not behaving like a typical kubernetes sidecar should.
The sidecar gets terminated immediately when the pipeline fails or completes without giving the sidecar enough time to
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pipeline fails or completes -> taskrun fails or completes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you!


### User Stories

A `Pipeline` author can add a Sidecar with `waitForSidecar` enabled (true) and expect `natural death` of his/her sidecar.
Copy link
Copy Markdown

@ghost ghost Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a design detail I think. User Stories might be something like:

As a Task author I want to add a sidecar that uploads the results of tests to external storage after the tests complete (even if they fail) so that I can review the results later.

As a Task author I want to add a sidecar that builds a report from the outcomes of some Steps and submits that report when the Task has reached a completed state so that the reports can be aggregated by an external system.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to update the User Stories with items like that?


A `Pipeline` author can add a Sidecar with `waitForSidecar` enabled (true) and expect `natural death` of his/her sidecar.

## Design Details
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include a description of how a running sidecar would be treated if a TaskRun timeout is reached.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation doesn't have it, but I believe, in such cases it should be force terminated.
What does Tekton do for other containers at this point?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/tektoncd/pipeline/blob/master/docs/taskruns.md#configuring-the-failure-timeout

The TaskRun's Pod is deleted if a timeout is hit. So I think the only thing to do here would be to add a line that describes this behaviour remaining the same even if the user has waitForSidecar: true. Something like "If a TaskRun's timeout is reached then Sidecars are terminated regardless of the state of this flag."

Comment thread teps/0013-graceful-sidecar-termination.md
@shahnewazrifat shahnewazrifat requested a review from a user September 1, 2020 19:21
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a bunch of comments. I'm generally in favor of LGTMing this TEP for the feature but don't totally agree with the motivations behind it. Would also like to see the formatting cleaned up to match others and a couple of design issues addressed. Also very curious to hear from other @tektoncd/core-maintainers !

Having no way to alter the behavior was the second motivation.

Consider this task:
Task:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section formats a little strangely when rendered from the markdown. Here's a screenshot of how it renders:

image


Is this formatting intentional?

do what it originally intended to do (for example, upload logs, process artifacts etc.)
In some cases the sidecar should be able to choose when to terminate itself. A sidecar is just a kubernetes container.
It's able to terminate itself with the command completion.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline.


## Summary

Currently, the sidecar feature in a tekton taskrun is not behaving like a typical kubernetes sidecar should.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to confess I'm less swayed by discussion revolving around what "typical" Kubernetes patterns are than I am useful features for Tekton users. Tekton already takes a very opinionated and non-typical approach to all of the containers inside the Pods it creates. As just one example of this opinionated approach: the containers for Steps are forced to execute sequentially, which is very non-typical by k8s standards. Any broad statement like Tekton should adhere to typical Kubernetes workload semantics don't, in and of themselves, carry a ton of water when we're discussing a Tekton-focused feature (at least in my view, which probably isn't shared by everyone).

Having said that I think there are some very strong use cases supported by this TEP. I'd much prefer to see this doc focused on those usage-driven improvements.

If we are going to include an appeal to authority here and say that Tekton should support this feature simply because there's an existing precedent in Kubernetes then I'd like to see reference to a doc that outlines what the typical Kubernetes sidecar behavior is. A possible point of reference might be the seemingly-stalled Sidecars lifecycle KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0753-sidecarcontainers.md which outlines what Sidecars might eventually become if they're ever officially adopted by Kubernetes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @sbwsg here, I think this sentence could go, there is not should or should not here, only "ways" to do things

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I was driven by the facts that came to my mind rather than thinking it from Tekton perspective 😄
Thank you guys!
Shall amend 🙇


### User Stories

A `Pipeline` author can add a Sidecar with `waitForSidecar` enabled (true) and expect `natural death` of his/her sidecar.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to update the User Stories with items like that?

## Motivation

The first motivation is the counter intuitive nature of the sidecars. As a developer, I did not expect my sidecar to be
terminated in such way. I had to check the docs to find out that the sidecar was being terminated using a Nop Image.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour is documented and has been how sidecars work since they were added to Tekton. I can understand this being surprising but similarly to my comment above about "typical" k8s behaviours I don't totally agree that this should be the primary motivation for the feature. Ideally in my view we should frame this TEP primarily around usefulness to Tekton's users.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @sbwsg , sidecars are what they are now. What we should put in motivation here is what is the benefit for users of having a way to allow sidecar to gracefully terminate. The fact that the current behaviour of sidecar is confusing or not shouldn't be a motivation on its own 😝

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, makes sense.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sbwsg and @vdemeester ! This TEP is really helping me learn how to do this! I was mixing up a proposal with implementation details. I'll rethink the whole thing and try to make it better 😄


**Pros/Cons:**
Pros:
Discussed above.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to follow the suggested format of the TEP, remove mention of Pros/Cons and just use the Advantages and Drawbacks sections.

Discussed above.

Cons:
Can't think of any. Unless the pipeline author adds a sidecar with this flag enabled and the sidecars' cmd is an infinite one. Causing the pipeline to hang
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good drawback to call out. Suggest formatting it something like:

## Drawbacks

* When the waitForSidecar flag is used and a timeout of 0 is specified for the TaskRun, a Sidecar that fails to terminate could result in a TaskRun or PipelineRun executing forever until a user manually intervenes to stop the Pod.

forever.

## Alternatives
Add WaitForSidecar flag to TaskRunSpec
Copy link
Copy Markdown

@ghost ghost Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest formatting this a bit differently just to make it a bit clearer on what's being proposed and rejected. Something like:

Alternatives

Add Sidecar stopping configuration to TaskRuns instead of Tasks

Add configuration to TaskRuns that allows per-TaskRun configuration of the sidecar stopping behavior. See example YAML below. This alternative is inferior to the proposal above because it does not allow per-Sidecar configuration of the stopping behavior. It also puts the decision to stop Sidecars into the hands of a Task's user instead of the Task's author.

apiVersion: tekton.dev/v1beta1
kind: TaskRun
spec:
  # note: setting the waitForSidecar flag on the TaskRun instead of Task
  waitForSidecar: true
  taskRef:
    kind: Task
    name: task-test-gitops-0800ac6-c7298ca-28


## References

* https://github.com/tektoncd/pipeline/issues/1131
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a link to your design doc as well for more context.

name: e2e-report
```

Golang Example:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this adds much that isn't already covered by the yaml example. Is there anything specific here you're trying to demonstrate wrt the proposed behavior or syntax?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. It's just there as an example.


## Summary

Currently, the sidecar feature in a tekton taskrun is not behaving like a typical kubernetes sidecar should.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @sbwsg here, I think this sentence could go, there is not should or should not here, only "ways" to do things

It's able to terminate itself with the command completion.


This proposal is introducing a flag `waitForSidecar` which is a per-sidecar configuration that will allow users to set
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't name a flag or field here.

This proposal is introducing a way to allow users to set, per-sidecar, the nature of termination of a sidecar inside a task should be enough. A flag or field is implementation detail, that can be tackle below 😉

## Motivation

The first motivation is the counter intuitive nature of the sidecars. As a developer, I did not expect my sidecar to be
terminated in such way. I had to check the docs to find out that the sidecar was being terminated using a Nop Image.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @sbwsg , sidecars are what they are now. What we should put in motivation here is what is the benefit for users of having a way to allow sidecar to gracefully terminate. The fact that the current behaviour of sidecar is confusing or not shouldn't be a motivation on its own 😝

The first motivation is the counter intuitive nature of the sidecars. As a developer, I did not expect my sidecar to be
terminated in such way. I had to check the docs to find out that the sidecar was being terminated using a Nop Image.

Having no way to alter the behavior was the second motivation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could turn into positive (aka, Having a way to alter …)

Comment on lines +71 to +72
### Non-Goals
This proposal is not introducing any breaking change to current behavior. Sidecars still terminate using Nop Image by default when the pipeline is completed/failed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a non-goal. A non-goal could be "this proposal doesn't aim to replace the current sidecar behavior" or something like that.

@ghost
Copy link
Copy Markdown

ghost commented Sep 28, 2020

@shahnewazrifat just checking in here - are you still working on this TEP? Cheers!

@ghost
Copy link
Copy Markdown

ghost commented Nov 2, 2020

I'm going to close this for now. We can reopen if/when work on this TEP resumes!

@ghost
Copy link
Copy Markdown

ghost commented Nov 2, 2020

/close

@tekton-robot
Copy link
Copy Markdown
Contributor

@sbwsg: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants