fix(orchestrator): compound-review schedule cursor (Layer 0) Refs terraphim/terraphim-ai#1562 (Gitea)#872
Merged
Merged
Conversation
added 2 commits
May 17, 2026 13:33
Pre-existing rustfmt drift in the example files blocks the workspace-wide `cargo fmt --check` step in the pre-commit hook. Re-formatting in a separate commit so the substantive Layer 0 cursor fix (Refs #1562) stays scoped to the orchestrator. Refs #1562 (Gitea)
…ccurrence cursor Adds `last_compound_review_fired_at: Option<DateTime<Utc>>` to `AgentOrchestrator` and rewrites the compound-review branch of `check_cron_schedules` so the cursor is recorded **before** the `.await` on `handle_schedule_event`. Mirrors the per-agent `last_cron_fire` pattern at the start of the same function. Why: `reconcile_tick` is wrapped in a 90 s `tokio::time::timeout` safety net. When the future is cancelled mid-await, `last_tick_time` is never updated, so the previous `should_fire` check kept returning true on every subsequent tick, spawning a fresh review worktree every 30 s (the bigbox storm). Recording the cursor synchronously before the await makes cancellation safe: the next iteration sees the cursor and short-circuits via `already_fired = fire_time <= prev`. Verified via new regression test `test_compound_review_cursor_advances_on_cancellation`: plants a `last_tick_time` 2 h in the past, runs `check_cron_schedules` twice without advancing wall-clock, and asserts the cursor is `Some(_)` after the first call and unchanged after the second. Note on design drift: the design doc (docs/design/adf-worktree-lifecycle-design.md section 4.1) used `take_while(|t| *t <= now)` plus an explicit `already_fired` gate. I initially collapsed both into `compound_sched.after(&cursor).next()` for brevity but reverted to the documented shape after the regression test exposed the catch-up vs gating semantic difference -- the design's `last_tick_time`-anchored `next_fire` plus separate cursor gate is the correct read of "same occurrence, do not re-fire". Line numbers in the design (`:241`, `:817`, `:7137-7161`, `:7712`) all matched current source unchanged. Layer 0 of the ADF worktree lifecycle epic (Gitea #1567). Out of scope: the per-agent cron path at lib.rs:7484, the WorktreeGuard refactor (Layer 1, Gitea #1569), startup sweep (Layer 2, #1570), and the adf-cleanup.sh hardening (Layer 3, #1571). Refs #1562 (Gitea)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Layer 0 of the ADF worktree-lifecycle hardening epic (Gitea #1567). Fixes the schedule-cursor bug that caused the 2026-05-17 compound-review firing storm on bigbox (memory 99.9G/100G, 692 leaked worktrees, no verdict posted to Gitea #514 during the storm).
Root cause:
self.last_tick_time(lib.rs:5688) is updated only at the end ofreconcile_tick. When thetokio::time::timeout(90s, ...)wrapper atlib.rs:1288cancels the tick, the update is skipped, and the next tick re-fires the same past cron occurrence.Fix: dedicated
last_compound_review_fired_at: Option<DateTime<Utc>>cursor advanced inside the fire branch before.await. Cursor advance survives future cancellation; cron check atlib.rs:7137-61now uses this cursor instead oflast_tick_time.Test plan
cargo test -p terraphim_orchestrator test_compound_review_cursor_advances_on_cancellation-- passes.cargo test -p terraphim_orchestrator-- 682 lib tests + integration suites pass.cargo clippy -p terraphim_orchestrator --lib --tests -- -D warnings-- clean.Refs terraphim/terraphim-ai#1562 (Gitea)
Refs terraphim/terraphim-ai#1567 (Gitea epic)
Includes a
style(terraphim_multi_agent)prerequisite commit (rustfmt fixes oncrates/terraphim_multi_agent/examples/*.rs) absorbing pre-existing drift onmain; required because the pre-commit hook gates on workspace-widecargo fmt --check. Coordinate with the parallel #1558 WIP on those files.