New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix polling (ECS) command executors fail to run tasks on retry #1620
Fix polling (ECS) command executors fail to run tasks on retry #1620
Conversation
Signed-off-by: Naoto Yokoyama <builtinnya@gmail.com>
Thank you for creating this PR! |
@szyn
I understand your concern but we probably need to change codes other than EcsCommandExecutor and could affect all other command executors anyway because EcsCommandExecutor's polling behavior itself is achieved by coordination of BaseOperator, ShOperatorFactory, and so on. At least the current executors can't be affected by this change as it can't rely on digdag/digdag-standards/src/main/java/io/digdag/standards/operator/ShOperatorFactory.java Lines 105 to 113 in 0c6e58a
Operators run commands only if To summarize my points:
Please let me know if some of my changes aren't clear to you and any suggestion for better changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could reproduce it with ECS command executor.
Since commandStatus is not used by other command executors, no drawbacks in this PR. LGTM.
While sh and py operator tried to remove commandStatus for non-zero exits, I guess it's different objects and no effect for retrying.
https://github.com/treasure-data/digdag/blob/master/digdag-standards/src/main/java/io/digdag/standards/operator/ShOperatorFactory.java#L120
Sho agreed to merge this PR. Thank you for contributing! |
Yes, finally, I could reproduce this issue... and this approach looks good to me 👍 Thank you for contributing! |
Signed-off-by: Naoto Yokoyama <builtinnya@gmail.com>
#1632) Signed-off-by: Naoto Yokoyama <builtinnya@gmail.com> Co-authored-by: Naoto Yokoyama <builtinnya@gmail.com>
Signed-off-by: Naoto Yokoyama <builtinnya@gmail.com>
This PR fixes a bug that ECS command executor fails to submit ECS tasks on retry.
Reproducible steps
digdag version: v0.10.1 (and some earlier versions probably)
Assuming that ECS command executor is set up correctly:
Add and run the following workflow
See task logs to confirm that the command executor actually prints
Task has been executed!
once.Expected behavior
In the above steps, the command executor should print
Task has been executed!
twice.Cause
Each operator runs a command (= submits an ECS task) only when
"commandStatus"
doesn't exist on state params.However, ECS command executor's polling mechanism persists
"commandStatus"
even on retry, which tries to poll an ECS task that has already exited.Approach in this PR
This PR takes minimal change approach and simply removes
"commandStatus"
from state params on retry.Considerations
TaskExecutionException
for commands' failure to propagate state params."commandStatus"
from state params on failure (e.g.sh>
op) but it has no effect on the state params on retry because it just throwsRuntimeException
right after that. Should we remove these confusing lines?