Timed Out or Cancelled TaskRun Pods are Deleted #3051

danielhelfand · 2020-08-03T21:02:25Z

Expected Behavior

After a TaskRun is timed out or cancelled, I should still be able to view the logs of the failed or cancelled TaskRun.

Actual Behavior

It appears that the pods for TaskRuns are being deleted due to a change implemented in #2365. In the func, failTaskRun in taskrun.go, the pod associated with the TaskRun is being deleted as noted here.

Assuming this is not an expected feature, I would suggest checking for the failure reason before deleting the TaskRun pod:

if reason != v1beta1.TaskRunReasonTimedOut && reason != v1beta1.TaskRunReasonCancelled {
		err := c.KubeClientSet.CoreV1().Pods(tr.Namespace).Delete(tr.Status.PodName, &metav1.DeleteOptions{})

		if err != nil && !k8serrors.IsNotFound(err) {
			logger.Infof("Failed to terminate pod: %v", err)
			return err
		}
}

If this is expected, it would help to document this behavior for TaskRuns as well as recommended best practices for preserving logs.

Steps to Reproduce the Problem

Create a TaskRun with a timeout such that the timeout will be violated or cancel a running TaskRun
Run kubectl get pods to find if the pod of the TaskRun has been deleted

Additional Info

Kubernetes version:

N/A

Tekton Pipeline version:

Using what is in latest as of v0.15.0.

The text was updated successfully, but these errors were encountered:

danielhelfand · 2020-08-03T21:04:17Z

/assign

Will wait for confirmation on if this is expect for TaskRuns, but can pick this up if it is unintended.

pritidesai · 2020-08-04T00:37:14Z

/cc @afrittoli @bobcatfish

vdemeester · 2020-08-04T09:48:52Z

It appears that the pods for TaskRuns are being deleted due to a change implemented in #2365. In the func, failTaskRun in taskrun.go, the pod associated with the TaskRun is being deleted as noted here.

#2365 didn't change that behavior (see https://github.com/tektoncd/pipeline/pull/2365/files#diff-0239ea30db655388fa943dffd3b7a2e6L49), the cancellation feature always worked by deleting the pod. That was, at the time, the only way to do it. As @imjasonh said somewhere, we may be able to do that now using the entrypoint and some signal (what docker kill was doing, not sure the equivalent in k8s yet).

danielhelfand · 2020-08-04T11:47:16Z

Yes, my misunderstanding, but now it makes sense. I now see this is what is required to actually stop the TaskRun and guess I never noticed the behavior. So I guess what could be nice here are two things:

This behavior should be documented under the TaskRun documentation. My original thought when this came up was to see if this was expected by looking at docs, but now nothing is there.
This should be changed to a feature to find a way to preserve logs. By either the entrypoint approach or something like Jason mentioned on Slack:

In general we should have users expect to find logs in some log persistence solution (e.g., Stackdriver) and not rely on the Pod still being around at all

bobcatfish · 2020-08-04T14:15:43Z

Copying over from slack, #381 is when we first added this and it looks like we've been deleting the pods all along - though that surprised me too and I agree it's inconsistent with how we treat other pods (and makes it harder for dashboard folks :S)

bobcatfish · 2020-08-14T15:28:25Z

@imjasonh created an alternate proposal in #2559

tekton-robot · 2020-11-12T16:15:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

ghost · 2020-11-13T18:22:29Z

/remove-lifecycle stale

tekton-robot · 2021-02-11T19:04:09Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-03-13T19:52:12Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-05-07T20:40:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-05-07T20:40:44Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pritidesai · 2021-05-07T21:38:02Z

/reopen

I don't think we have reached to resolution for this yet ...

tekton-robot · 2021-06-06T21:39:08Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-06-06T21:39:10Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

danielhelfand added the kind/bug Categorizes issue or PR as related to a bug. label Aug 3, 2020

tekton-robot assigned danielhelfand Aug 3, 2020

danielhelfand mentioned this issue Aug 4, 2020

Include More Details on Pod Deletion for Cancelled/Timed Out TaskRuns #3053

Merged

4 tasks

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020

tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2020

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 13, 2021

tekton-robot closed this as completed May 7, 2021

pritidesai reopened this May 7, 2021

tekton-robot closed this as completed Jun 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timed Out or Cancelled TaskRun Pods are Deleted #3051

Timed Out or Cancelled TaskRun Pods are Deleted #3051

danielhelfand commented Aug 3, 2020 •

edited

danielhelfand commented Aug 3, 2020

pritidesai commented Aug 4, 2020

vdemeester commented Aug 4, 2020 •

edited

danielhelfand commented Aug 4, 2020

bobcatfish commented Aug 4, 2020

bobcatfish commented Aug 14, 2020

tekton-robot commented Nov 12, 2020

ghost commented Nov 13, 2020

tekton-robot commented Feb 11, 2021

tekton-robot commented Mar 13, 2021

tekton-robot commented May 7, 2021

tekton-robot commented May 7, 2021

pritidesai commented May 7, 2021

tekton-robot commented Jun 6, 2021

tekton-robot commented Jun 6, 2021

Timed Out or Cancelled TaskRun Pods are Deleted #3051

Timed Out or Cancelled TaskRun Pods are Deleted #3051

Comments

danielhelfand commented Aug 3, 2020 • edited

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

danielhelfand commented Aug 3, 2020

pritidesai commented Aug 4, 2020

vdemeester commented Aug 4, 2020 • edited

danielhelfand commented Aug 4, 2020

bobcatfish commented Aug 4, 2020

bobcatfish commented Aug 14, 2020

tekton-robot commented Nov 12, 2020

ghost commented Nov 13, 2020

tekton-robot commented Feb 11, 2021

tekton-robot commented Mar 13, 2021

tekton-robot commented May 7, 2021

tekton-robot commented May 7, 2021

pritidesai commented May 7, 2021

tekton-robot commented Jun 6, 2021

tekton-robot commented Jun 6, 2021

danielhelfand commented Aug 3, 2020 •

edited

vdemeester commented Aug 4, 2020 •

edited