Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force complete activity when it is retrying #987

Open
samarabbas opened this issue Nov 13, 2020 · 10 comments
Open

Force complete activity when it is retrying #987

samarabbas opened this issue Nov 13, 2020 · 10 comments
Labels
difficulty: easy enhancement New feature or request up-for-grabs Issues to consider for external contribution

Comments

@samarabbas
Copy link
Contributor

Is your feature request related to a problem? Please describe.
A bug in activity could result in incorrect return type causing another activity to fail continuously. Provide a mechanism to force complete an activity in retry without restarting the workflow.

Describe the solution you'd like
RespondActivityTaskCompletedById api does not support retry attempt as input argument. Also need a way to allow completion when activity is backing off and not started at all.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@samarabbas samarabbas added the enhancement New feature or request label Nov 13, 2020
@samarabbas samarabbas added the up-for-grabs Issues to consider for external contribution label Jul 4, 2021
@zengzilu
Copy link

+1

1 similar comment
@jdupl123
Copy link

jdupl123 commented Nov 1, 2022

+1

@p4p4
Copy link

p4p4 commented Feb 20, 2023

+1
For me, it would be a convenient way to “repair” a single blocked WF if an activity is periodically failing. In a way, activity complete could allow to “repair forward” (skip over a failing activity), like reset already allows to “repair backwards” (go back to an earlier state of the workflow execution).

Of course it is notbest practice to (manually) repair workflows all the time, but in some edge cases and incidents it would be great to have the possibility. Redeploying an updated activity worker might not always be a solution.

@mfateev was also supporting this idea.
image

Eventually, this could even be integrated into the WebUI, so that e.g. also a support engineer could repair a workflow.

@p4p4
Copy link

p4p4 commented Feb 21, 2023

similarly, a tctl workflow complete command might also be handy to complete whole (child) workflows, what do you think?

@alexseedkou
Copy link
Contributor

alexseedkou commented Mar 21, 2024

Hi, I am interested in working on this issue.

Based on my understanding, if a workflow has multiple activities, the failure from one of them will result in the following activities' failure. Currently, the only way to fix this is to start the workflow from the beginning. We want to introduce a mechanism to retry the first failed activity when we find it is incorrect to prevent the cascading.

Is my understanding above correct? If so, how this is different from the retry options when we start a workflow with an activity here? We can handle the incorrect output from an activity with the retry options above.

Also, a way to reproduce this would be very helpful. Thank you!

@bergundy
Copy link
Member

This is fairly easy to add.

First we need to understand how to expose this in the API.
I would add a bool skip_started_check or bool force field in the RespondActivityTaskCompletedByIdRequest message and the other RPCs that resolve an activity (RespondActivityTaskFailedByIdRequest, RespondActivityTaskCanceledByIdRequest).

Then we need to relax this condition if the flag is set (in all corresponding APIs).

@bergundy
Copy link
Member

If you want to take this on, you should start by making a PR to the https://github.com/temporalio/api repo and if this is accepted, you can continue to implement in the server (this) repo.

@alexseedkou
Copy link
Contributor

Hi @bergundy, thank you for your reply.

May I know if my understanding above is correct regarding this issue, and how this is different from the retry options when we start a workflow with an activity here?

Thank you in advance for your guidance.

@bergundy
Copy link
Member

This issue is for allowing completing and failing activities that are currently backing off.
Seems like that's not what you want @alexseedkou based on your comment here:

Based on my understanding, if a workflow has multiple activities, the failure from one of them will result in the following activities' failure. Currently, the only way to fix this is to start the workflow from the beginning. We want to introduce a mechanism to retry the first failed activity when we find it is incorrect to prevent the cascading.

IIUC, you could reset the workflow to just before the activity was scheduled. Does that address your need?
Feel free to tag me on the Temporal community Slack to continue the discussion.

@alexseedkou
Copy link
Contributor

An update on this issue:

Team has discussed this issue internally and decided to change the server behavior to accept activity completions even if the activity is currently backing off by default/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: easy enhancement New feature or request up-for-grabs Issues to consider for external contribution
Projects
None yet
Development

No branches or pull requests

7 participants