Skip to content

fix: prevent execution records from getting stuck in running state#140

Merged
Qsnh merged 2 commits intomainfrom
fix/execution-status-cleanup
Apr 30, 2026
Merged

fix: prevent execution records from getting stuck in running state#140
Qsnh merged 2 commits intomainfrom
fix/execution-status-cleanup

Conversation

@Qsnh
Copy link
Copy Markdown
Contributor

@Qsnh Qsnh commented Apr 30, 2026

Three independent issues left worker_execution rows orphaned in running state, causing downstream busy-checks (e.g. /engine command) to report workers as occupied when nothing was actually executing:

  1. monitorExecution only updated the row on Done/Error output. When a process was killed, crashed, or signal-terminated the channel closed silently and the row stayed running forever. Add a fallback path that finalizes the row (status=failed, completed_at set) when the output channel closes without a terminal signal.

  2. task cancel only attempted to stop a locally-tracked process; if the process had already exited the StopExecution error was logged but the row was never updated. Add a MarkAbandoned-on-failure path so the execution row is finalized even when the process is gone, and apply the same logic to the bulk clear-session cancel loop.

  3. Server startup recovered tasks but not executions, so a crash or restart left every in-flight execution stuck in running. Add ExecutionStore.ResetRunningExecutions and call it alongside the existing recovery in app.BuildApp.

Adds MarkAbandoned helper that only updates pending/running rows so terminal states are never clobbered, plus regression tests covering all three paths.

Qsnh and others added 2 commits April 29, 2026 17:45
Three independent issues left worker_execution rows orphaned in `running`
state, causing downstream busy-checks (e.g. `/engine` command) to report
workers as occupied when nothing was actually executing:

1. monitorExecution only updated the row on Done/Error output. When a
   process was killed, crashed, or signal-terminated the channel closed
   silently and the row stayed `running` forever. Add a fallback path
   that finalizes the row (status=failed, completed_at set) when the
   output channel closes without a terminal signal.

2. task cancel only attempted to stop a locally-tracked process; if the
   process had already exited the StopExecution error was logged but the
   row was never updated. Add a MarkAbandoned-on-failure path so the
   execution row is finalized even when the process is gone, and apply
   the same logic to the bulk clear-session cancel loop.

3. Server startup recovered tasks but not executions, so a crash or
   restart left every in-flight execution stuck in `running`. Add
   ExecutionStore.ResetRunningExecutions and call it alongside the
   existing recovery in app.BuildApp.

Adds MarkAbandoned helper that only updates pending/running rows so
terminal states are never clobbered, plus regression tests covering
all three paths.
- Replace `error(nil)` with idiomatic `var stopErr error`
- Extract duplicated MarkAbandoned + log pattern into finalizeCancelledExecution helper
- Drop "Used by:" caller list from MarkAbandoned doc comment

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Qsnh Qsnh merged commit b88ab90 into main Apr 30, 2026
@Qsnh Qsnh deleted the fix/execution-status-cleanup branch April 30, 2026 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant