fix(autonomous): persist PolicyEngine state across pause/resume (AUDIT-C3)#257
fix(autonomous): persist PolicyEngine state across pause/resume (AUDIT-C3)#257xlabtg merged 3 commits intoxlabtg:mainfrom
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: xlabtg#256
…T-C3) Before this change, AutonomousTaskManager.runLoop() constructed a fresh PolicyEngine on every start *and* resume, so each pauseTask()+resumeTask() cycle wiped toolCallTimestamps, apiCallTimestamps, recentActions, and consecutiveUncertainCount. That made it trivial to script pause/resume to bypass the 100 tool-calls-per-hour rate limit, the 5-identical-actions loop detector, and the uncertainty escalator. Now the loop hydrates the PolicyEngine from a new `policy_state` table keyed by task_id and registers a write-through callback that flushes the snapshot on every mutation (`recordToolCall`, `recordApiCall`, `recordUncertain`, `resetUncertainCount`, `recordAction`). The snapshot is cleared only on terminal states (completed / failed / cancelled); paused tasks keep their state for the next resume. - `PolicyEngine.serialize()` / `hydrate()` / `setOnStateChange()` expose a stable snapshot contract (`PolicyEngineState`) that also owns `recentActions`, which the loop used to track in an in-memory field. - `AutonomousTaskStore.savePolicyState()` / `getPolicyState()` / `clearPolicyState()` back the new table. - Schema migration 1.23.0 adds `policy_state (task_id, state, updated_at)` with FK → autonomous_tasks and ON DELETE CASCADE. - `CURRENT_SCHEMA_VERSION` bumped to 1.23.0; `package.json` bumped to 0.8.11 for the release workflow. - Regression tests (11 new + 1 manager-level integration test) cover 10 pause/resume cycles, identical-action persistence, uncertainty carry-over, terminal cleanup, and mid-run snapshotting. Closes xlabtg#256
Solution summarySummaryFixed AUDIT-C3 (issue #256): pause/resume no longer bypasses Core change — Files touched — Verification — This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $7.929922📊 Context and tokens usage:
Total: (169.4K + 11.6M cached) input tokens, 43.8K output tokens, $7.929922 cost 🤖 Models used:
📎 Log file uploaded as Gist (3010KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart triggered (iteration 1)Reason: Merge conflicts detected Starting new session to address the issues. Auto-restart-until-mergeable mode is active. Will continue until PR becomes mergeable. |
Resolve conflicts in loop.ts, manager.test.ts, and .gitkeep. Integrate the AUDIT-H4 safeUpdateStatus guard (PR xlabtg#267) with the issue xlabtg#256 PolicyEngine persistence work: - loop.ts: apply safeUpdateStatus on failure/crash paths; hydrate PolicyEngine from policy_state and wire the write-through callback before marking the task running; guard clearStateOnTerminal() so a racy external pauseTask() keeps its policy snapshot. - manager.test.ts: keep both the issue xlabtg#256 rehydration regression test and the AUDIT-H4 race tests.
🔄 Auto-restart-until-mergeable Log (iteration 1)This log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $3.157243📊 Context and tokens usage:
Total: (84.7K + 4.3M cached) input tokens, 19.6K output tokens, $3.157243 cost 🤖 Models used:
📎 Log file uploaded as Gist (4774KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
Summary
Closes #256 (AUDIT-C3). Before this change,
AutonomousTaskManager.runLoop()constructed a brand-newPolicyEngineon every start and resume, so everypauseTask()+resumeTask()cycle silently reset:toolCallTimestamps→ bypassingrateLimit.toolCallsPerHour(default 100)apiCallTimestamps→ bypassingrateLimit.apiCallsPerMinute(default 30)recentActions→ disarming the 5-identical-actions loop detectorconsecutiveUncertainCount→ disarming the uncertainty escalatorThe loop now hydrates a single
PolicyEnginefrom a newpolicy_statetable keyed bytask_id, and registers a write-through callback that flushes a snapshot on every mutation. Snapshots are cleared only on terminal task states (completed/failed/cancelled); paused tasks keep their state for the next resume.Changes
Persistence
policy_state (task_id PK, state JSON, updated_at, FK → autonomous_tasks ON DELETE CASCADE)1.23.0(src/memory/schema.ts+src/memory/migrations/1.23.0.sql)AutonomousTaskStore.savePolicyState()/getPolicyState()/clearPolicyState()PolicyEngine
PolicyEngineStateinterface (toolCallTimestamps,apiCallTimestamps,consecutiveUncertainCount,recentActions)serialize(),hydrate(),setOnStateChange(),recordAction(),getRecentActions()methodsrecordToolCall/recordApiCall/recordUncertain/resetUncertainCount/recordActionnow trigger theonStateChangecallback.resetUncertainCountno-ops (and does not write to DB) when the counter is already zero, which is the common non-stuck path.Loop
AutonomousLoop.run()hydrates the engine frompolicy_statebefore the first policy check and wires the write-through callback.recentActionsmoved from an in-memory field on the loop into the engine so it is covered by the same snapshot contract.completed,failed,cancelled,max_iterations,rate_limit, planning failure, crash) clearpolicy_state. The abort path (e.g. graceful stop during pause) preserves state.Housekeeping
CURRENT_SCHEMA_VERSIONbumped to1.23.0package.jsonversion bumped to0.8.11(to trigger the release workflow)CHANGELOG.mdentry under Unreleased / FixedAcceptance criteria coverage (from the issue)
policy_statecreated (migration 1.23.0 + schema entry).PolicyEngineis hydrated from storage on resume; state is persisted on everyrecord*call.toolCallsPerHour.consecutiveUncertainCountis not cleared by pause/resume.src/autonomous/__tests__/.Reproduction
Before the fix, this would let an agent make unlimited tool calls:
After the fix, the rate-limit check trips on attempt #101 regardless of how many pause/resume cycles are injected. The new test
tool-call rate limit still fires after 10 pause/resume cyclesencodes exactly that scenario.Test plan
npm run typecheck— cleannpm run lint— clean (max-warnings 0)npm run format:check— cleannpm test— 2942 passed / 140 filessrc/autonomous/__tests__/policy-persistence.test.ts— 11/11 passed