-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements on pipeline cancel #2543
Merged
tekton-robot
merged 1 commit into
tektoncd:master
from
afrittoli:issues/2082-simplify-pipeline-cancel
May 7, 2020
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,6 +48,7 @@ var ( | |
stats.UnitDimensionless) | ||
) | ||
|
||
// Recorder holds keys for Tekton metrics | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for adding docs! |
||
type Recorder struct { | ||
initialized bool | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -158,9 +158,10 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error { | |
|
||
// In case of reconcile errors, we store the error in a multierror, attempt | ||
// to update, and return the original error combined with any update error | ||
var merr error | ||
var merr *multierror.Error | ||
|
||
if pr.IsDone() { | ||
switch { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice :D |
||
case pr.IsDone(): | ||
// We may be reading a version of the object that was stored at an older version | ||
// and may not have had all of the assumed default specified. | ||
pr.SetDefaults(contexts.WithUpgradeViaDefaulting(ctx)) | ||
|
@@ -181,7 +182,13 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error { | |
c.Logger.Warnf("Failed to log the metrics : %v", err) | ||
} | ||
}(c.metrics) | ||
} else { | ||
case pr.IsCancelled(): | ||
// If the pipelinerun is cancelled, cancel tasks and update status | ||
before := pr.Status.GetCondition(apis.ConditionSucceeded) | ||
merr = multierror.Append(merr, cancelPipelineRun(c.Logger, pr, c.PipelineClientSet)) | ||
after := pr.Status.GetCondition(apis.ConditionSucceeded) | ||
reconciler.EmitEvent(c.Recorder, before, after, pr) | ||
default: | ||
if err := c.tracker.Track(pr.GetTaskRunRef(), pr); err != nil { | ||
c.Logger.Errorf("Failed to create tracker for TaskRuns for PipelineRun %s: %v", pr.Name, err) | ||
c.Recorder.Event(pr, corev1.EventTypeWarning, eventReasonFailed, "Failed to create tracker for TaskRuns for PipelineRun") | ||
|
@@ -228,7 +235,7 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error { | |
}(c.metrics) | ||
} | ||
|
||
return merr | ||
return merr.ErrorOrNil() | ||
} | ||
|
||
func (c *Reconciler) updatePipelineResults(ctx context.Context, pr *v1alpha1.PipelineRun) { | ||
|
@@ -527,15 +534,6 @@ func (c *Reconciler) reconcile(ctx context.Context, pr *v1alpha1.PipelineRun) er | |
} | ||
} | ||
|
||
// If the pipelinerun is cancelled, cancel tasks and update status | ||
if pr.IsCancelled() { | ||
before := pr.Status.GetCondition(apis.ConditionSucceeded) | ||
err := cancelPipelineRun(c.Logger, pr, pipelineState, c.PipelineClientSet) | ||
after := pr.Status.GetCondition(apis.ConditionSucceeded) | ||
reconciler.EmitEvent(c.Recorder, before, after, pr) | ||
return err | ||
bobcatfish marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
if pipelineState.IsBeforeFirstTaskRun() && pr.HasVolumeClaimTemplate() { | ||
// create workspace PVC from template | ||
if err = c.pvcHandler.CreatePersistentVolumeClaimsForWorkspaces(pr.Spec.Workspaces, pr.GetOwnerReference(), pr.Namespace); err != nil { | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just double checking, can we guarantee that all started TaskRuns will be in the status? im guessing we can but wanted to check: i think this would mean that in the same reconcile loop that we create a TaskRun, we always update the status (i'm wondering also if there was an error partway through if we might end up in an inconsistent state? e.g. we create the taskrun, then a transient networking error prevents us from updating the status?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about that. In my understanding the reconciler would not take the same key from the queue. When we start to reconcile we get the latest version of the pr from the API, that should have all running taskruns. It might be that a previous reconcile failed to update the status of the pipeline, but the same may be true for the taskrun status too, so I feel we are not introducing a new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation uses the
pipelineState
which requires fetching the pipeline and taskruns and tasks and pipeline resources and validating and binding all together... I was hoping to moving the catch of cancel earlier in the reconcile and thus avoid calculating thepipelineState
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in that case the next reconcile would notice that the status needed to be updated anyway maybe, then update it, then maybe the "is cancelled" block would catch any taskruns that need to be cancelled that werent before 🤔
it seems like eventually it would end up in the right state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, you make a very good point.
I agree that the scenario you describe is not covered.
Looking deeper into the code, it looks like it is not covered today either 😅
ResolvePipelineRun is used to build the pipeline run state. That function behaves as follows:
NotFound
, ignoreThis means that if we fail to sync the pipelinerun status, we never go a list TaskRuns by owner, so the taskruns created during the reconcile cycle where the issue happened become orphaned. A new name is generated for the tasks so presumably
dag.GetSchedulable
will return the task again and it will end up running the same task(s) twice.It is a rather unlikely case to happen, since it requires being able to create a taskrun and a few milliseconds later being unable to update the pipelinerun status, however it may well happen.
I think a way forward would be to ensure that the pipelinerun status is valid.
To make sure that the status is in sync with the list of taskruns owned by the pipelinerun, we can run this check before we try and use the pipelinerun status as source of truth. To do we only need the pipelinerun status and the object ref of the pipelinerun, so we do not need to go through the resolution and validation and conversion and re-conversion madness.
I will propose a patch that does that, keep this on hold, and once (if) the other patch is approved, it should then be fine to merge this roughly as it is today. Does it sound like a plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2558
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobcatfish Since - given #2558 - this patch does not change the situation compared to today - I wonder if we could go ahead with this and fix #2558 then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I setup a unit test to verify that we do not recover orphaned taskruns - assuming that they may happen in case the pipelinerun status sync fails.
This WIP fix shows how we could solve the issue: #2558
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes!! nice find :D
kk sounds good to me!
Thanks for looking into this and opening the other issue 🙏