Skip to content

fix(orchestrator): compound-review schedule cursor (Layer 0) Refs terraphim/terraphim-ai#1562 (Gitea)#872

Merged
AlexMikhalev merged 2 commits into
mainfrom
task/1562-schedule-cursor
May 17, 2026
Merged

fix(orchestrator): compound-review schedule cursor (Layer 0) Refs terraphim/terraphim-ai#1562 (Gitea)#872
AlexMikhalev merged 2 commits into
mainfrom
task/1562-schedule-cursor

Conversation

@AlexMikhalev
Copy link
Copy Markdown
Contributor

Summary

Layer 0 of the ADF worktree-lifecycle hardening epic (Gitea #1567). Fixes the schedule-cursor bug that caused the 2026-05-17 compound-review firing storm on bigbox (memory 99.9G/100G, 692 leaked worktrees, no verdict posted to Gitea #514 during the storm).

Root cause: self.last_tick_time (lib.rs:5688) is updated only at the end of reconcile_tick. When the tokio::time::timeout(90s, ...) wrapper at lib.rs:1288 cancels the tick, the update is skipped, and the next tick re-fires the same past cron occurrence.

Fix: dedicated last_compound_review_fired_at: Option<DateTime<Utc>> cursor advanced inside the fire branch before .await. Cursor advance survives future cancellation; cron check at lib.rs:7137-61 now uses this cursor instead of last_tick_time.

Test plan

  • cargo test -p terraphim_orchestrator test_compound_review_cursor_advances_on_cancellation -- passes.
  • cargo test -p terraphim_orchestrator -- 682 lib tests + integration suites pass.
  • cargo clippy -p terraphim_orchestrator --lib --tests -- -D warnings -- clean.
  • Phase 4 verification on bigbox per design §8.1 after merge.

Refs terraphim/terraphim-ai#1562 (Gitea)
Refs terraphim/terraphim-ai#1567 (Gitea epic)

Includes a style(terraphim_multi_agent) prerequisite commit (rustfmt fixes on crates/terraphim_multi_agent/examples/*.rs) absorbing pre-existing drift on main; required because the pre-commit hook gates on workspace-wide cargo fmt --check. Coordinate with the parallel #1558 WIP on those files.

Alex added 2 commits May 17, 2026 13:33
Pre-existing rustfmt drift in the example files blocks the
workspace-wide `cargo fmt --check` step in the pre-commit hook.
Re-formatting in a separate commit so the substantive Layer 0 cursor
fix (Refs #1562) stays scoped to the orchestrator.

Refs #1562 (Gitea)
…ccurrence cursor

Adds `last_compound_review_fired_at: Option<DateTime<Utc>>` to
`AgentOrchestrator` and rewrites the compound-review branch of
`check_cron_schedules` so the cursor is recorded **before** the
`.await` on `handle_schedule_event`. Mirrors the per-agent
`last_cron_fire` pattern at the start of the same function.

Why: `reconcile_tick` is wrapped in a 90 s `tokio::time::timeout`
safety net. When the future is cancelled mid-await, `last_tick_time`
is never updated, so the previous `should_fire` check kept returning
true on every subsequent tick, spawning a fresh review worktree every
30 s (the bigbox storm). Recording the cursor synchronously before the
await makes cancellation safe: the next iteration sees the cursor and
short-circuits via `already_fired = fire_time <= prev`.

Verified via new regression test
`test_compound_review_cursor_advances_on_cancellation`: plants a
`last_tick_time` 2 h in the past, runs `check_cron_schedules` twice
without advancing wall-clock, and asserts the cursor is `Some(_)` after
the first call and unchanged after the second.

Note on design drift: the design doc (docs/design/adf-worktree-lifecycle-design.md
section 4.1) used `take_while(|t| *t <= now)` plus an explicit
`already_fired` gate. I initially collapsed both into
`compound_sched.after(&cursor).next()` for brevity but reverted to the
documented shape after the regression test exposed the catch-up vs
gating semantic difference -- the design's `last_tick_time`-anchored
`next_fire` plus separate cursor gate is the correct read of "same
occurrence, do not re-fire". Line numbers in the design (`:241`,
`:817`, `:7137-7161`, `:7712`) all matched current source unchanged.

Layer 0 of the ADF worktree lifecycle epic (Gitea #1567). Out of scope:
the per-agent cron path at lib.rs:7484, the WorktreeGuard refactor
(Layer 1, Gitea #1569), startup sweep (Layer 2, #1570), and the
adf-cleanup.sh hardening (Layer 3, #1571).

Refs #1562 (Gitea)
@AlexMikhalev AlexMikhalev merged commit 43c15ba into main May 17, 2026
8 of 11 checks passed
@AlexMikhalev AlexMikhalev deleted the task/1562-schedule-cursor branch May 17, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant