Fail durable tasks immediately for non-retryable errors #66
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, we classify only a few error types (including errors from user steps) as retryable. Everything else is non-retryable, and causes the task to fail immediately, without any retries
Note
Medium Risk
Changes task failure/retry semantics across both Rust worker logic and the
durable.fail_runPostgres function, which can alter how many attempts are created and when tasks become terminal. Moderate risk due to workflow correctness implications if error classification is wrong or migration rollout is incomplete.Overview
Non-retryable errors now fail tasks immediately instead of scheduling retries. The
durable.fail_runstored procedure gains ap_force_failflag; when set, it skips retry-time computation and run creation and marks the task terminal.Error classification and propagation were tightened in Rust.
TaskErrorreplaces the generic internal-error variant withStepandTaskPanicked, addsretryable()to drive retry decisions, and removes the blanketFrom<sqlx::Error>impl in favor offrom_sqlx_error.TaskContextnow wraps userstepfailures asTaskError::Step, and the worker passesforce_fail = !error.retryable()when callingdurable.fail_run.Tests were updated/added to reflect the new semantics. Existing retry/checkpoint tests were adjusted for an extra checkpointed
maybe_failstep, and a new retry test assertsUsererrors are not retried even when a retry strategy is configured.Written by Cursor Bugbot for commit 7e69d63. This will update automatically on new commits. Configure here.