fix(autonomous): persist PolicyEngine state across pause/resume (AUDIT-C3) by konard · Pull Request #257 · xlabtg/teleton-agent

konard · 2026-04-22T19:55:49Z

Summary

Closes #256 (AUDIT-C3). Before this change, AutonomousTaskManager.runLoop() constructed a brand-new PolicyEngine on every start and resume, so every pauseTask() + resumeTask() cycle silently reset:

toolCallTimestamps → bypassing rateLimit.toolCallsPerHour (default 100)
apiCallTimestamps → bypassing rateLimit.apiCallsPerMinute (default 30)
recentActions → disarming the 5-identical-actions loop detector
consecutiveUncertainCount → disarming the uncertainty escalator

The loop now hydrates a single PolicyEngine from a new policy_state table keyed by task_id, and registers a write-through callback that flushes a snapshot on every mutation. Snapshots are cleared only on terminal task states (completed / failed / cancelled); paused tasks keep their state for the next resume.

Changes

Persistence

New table policy_state (task_id PK, state JSON, updated_at, FK → autonomous_tasks ON DELETE CASCADE)
Schema migration 1.23.0 (src/memory/schema.ts + src/memory/migrations/1.23.0.sql)
AutonomousTaskStore.savePolicyState() / getPolicyState() / clearPolicyState()

PolicyEngine

New PolicyEngineState interface (toolCallTimestamps, apiCallTimestamps, consecutiveUncertainCount, recentActions)
New serialize(), hydrate(), setOnStateChange(), recordAction(), getRecentActions() methods
recordToolCall / recordApiCall / recordUncertain / resetUncertainCount / recordAction now trigger the onStateChange callback. resetUncertainCount no-ops (and does not write to DB) when the counter is already zero, which is the common non-stuck path.

Loop

AutonomousLoop.run() hydrates the engine from policy_state before the first policy check and wires the write-through callback.
recentActions moved from an in-memory field on the loop into the engine so it is covered by the same snapshot contract.
Terminal transitions (completed, failed, cancelled, max_iterations, rate_limit, planning failure, crash) clear policy_state. The abort path (e.g. graceful stop during pause) preserves state.

Housekeeping

CURRENT_SCHEMA_VERSION bumped to 1.23.0
package.json version bumped to 0.8.11 (to trigger the release workflow)
CHANGELOG.md entry under Unreleased / Fixed

Acceptance criteria coverage (from the issue)

Migration / schema for policy_state created (migration 1.23.0 + schema entry).
PolicyEngine is hydrated from storage on resume; state is persisted on every record* call.
Unit test: 10 pause/resume cycles do not reset toolCallsPerHour.
Unit test: identical-action detector persists through pause/resume.
Unit test: consecutiveUncertainCount is not cleared by pause/resume.
Regression integration test added under src/autonomous/__tests__/.

Reproduction

Before the fix, this would let an agent make unlimited tool calls:

for (let i = 0; i < 1000; i++) {
  // burn through the 100/hour budget
  manager.pauseTask(id);
  manager.resumeTask(id); // fresh PolicyEngine — counter reset to 0
}

After the fix, the rate-limit check trips on attempt #101 regardless of how many pause/resume cycles are injected. The new test tool-call rate limit still fires after 10 pause/resume cycles encodes exactly that scenario.

Test plan

npm run typecheck — clean
npm run lint — clean (max-warnings 0)
npm run format:check — clean
npm test — 2942 passed / 140 files
New suite src/autonomous/__tests__/policy-persistence.test.ts — 11/11 passed
Manager regression test for issue audit-c3-pause-resume-policy-bypass #256 — passing

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: xlabtg#256

…T-C3) Before this change, AutonomousTaskManager.runLoop() constructed a fresh PolicyEngine on every start *and* resume, so each pauseTask()+resumeTask() cycle wiped toolCallTimestamps, apiCallTimestamps, recentActions, and consecutiveUncertainCount. That made it trivial to script pause/resume to bypass the 100 tool-calls-per-hour rate limit, the 5-identical-actions loop detector, and the uncertainty escalator. Now the loop hydrates the PolicyEngine from a new `policy_state` table keyed by task_id and registers a write-through callback that flushes the snapshot on every mutation (`recordToolCall`, `recordApiCall`, `recordUncertain`, `resetUncertainCount`, `recordAction`). The snapshot is cleared only on terminal states (completed / failed / cancelled); paused tasks keep their state for the next resume. - `PolicyEngine.serialize()` / `hydrate()` / `setOnStateChange()` expose a stable snapshot contract (`PolicyEngineState`) that also owns `recentActions`, which the loop used to track in an in-memory field. - `AutonomousTaskStore.savePolicyState()` / `getPolicyState()` / `clearPolicyState()` back the new table. - Schema migration 1.23.0 adds `policy_state (task_id, state, updated_at)` with FK → autonomous_tasks and ON DELETE CASCADE. - `CURRENT_SCHEMA_VERSION` bumped to 1.23.0; `package.json` bumped to 0.8.11 for the release workflow. - Regression tests (11 new + 1 manager-level integration test) cover 10 pause/resume cycles, identical-action persistence, uncertainty carry-over, terminal cleanup, and mid-run snapshotting. Closes xlabtg#256

konard · 2026-04-22T20:18:18Z

Solution summary

PR #257 is now ready at #257.

Summary

Fixed AUDIT-C3 (issue #256): pause/resume no longer bypasses PolicyEngine sliding-window limits.

Core change — AutonomousLoop.run() now hydrates its PolicyEngine from a new per-task policy_state table and writes a snapshot on every state mutation. A fresh PolicyEngine on resume used to wipe toolCallTimestamps, apiCallTimestamps, recentActions, and consecutiveUncertainCount; those now survive any number of pause/resume cycles. Terminal states clear the snapshot.

Files touched — policy-engine.ts (serialize/hydrate + callback), loop.ts (hydrate on start, wire callback, clear on terminal), memory/schema.ts + migration 1.23.0.sql (new policy_state table, version bump to 1.23.0), memory/agent/autonomous-tasks.ts (save/get/clear methods), package bump to 0.8.11, CHANGELOG entry, 11 new regression tests in src/autonomous/__tests__/policy-persistence.test.ts plus a manager-level integration test.

Verification — npm run typecheck, npm run lint, npm run format:check all clean; full suite 2942 / 2942 passing.

This summary was automatically extracted from the AI working session output.

konard · 2026-04-22T20:18:27Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $7.929922

📊 Context and tokens usage:

167.4K / 1M (17%) input tokens, 43.8K / 128K (34%) output tokens

Total: (169.4K + 11.6M cached) input tokens, 43.8K output tokens, $7.929922 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Model: Claude Opus 4.7 (claude-opus-4-7)

📎 Log file uploaded as Gist (3010KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-04-22T20:20:33Z

🔄 Auto-restart triggered (iteration 1)

Reason: Merge conflicts detected

Starting new session to address the issues.

Auto-restart-until-mergeable mode is active. Will continue until PR becomes mergeable.

Resolve conflicts in loop.ts, manager.test.ts, and .gitkeep. Integrate the AUDIT-H4 safeUpdateStatus guard (PR xlabtg#267) with the issue xlabtg#256 PolicyEngine persistence work: - loop.ts: apply safeUpdateStatus on failure/crash paths; hydrate PolicyEngine from policy_state and wire the write-through callback before marking the task running; guard clearStateOnTerminal() so a racy external pauseTask() keeps its policy snapshot. - manager.test.ts: keep both the issue xlabtg#256 rehydration regression test and the AUDIT-H4 race tests.

konard · 2026-04-22T20:28:05Z

🔄 Auto-restart-until-mergeable Log (iteration 1)

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $3.157243

📊 Context and tokens usage:

87.3K / 1M (9%) input tokens, 19.6K / 128K (15%) output tokens

Total: (84.7K + 4.3M cached) input tokens, 19.6K output tokens, $3.157243 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Model: Claude Opus 4.7 (claude-opus-4-7)

📎 Log file uploaded as Gist (4774KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard added 2 commits April 22, 2026 19:55

Initial commit with task details

9911e59

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: xlabtg#256

konard changed the title ~~[WIP] audit-c3-pause-resume-policy-bypass~~ fix(autonomous): persist PolicyEngine state across pause/resume (AUDIT-C3) Apr 22, 2026

konard marked this pull request as ready for review April 22, 2026 20:18

xlabtg merged commit d69e581 into xlabtg:main Apr 22, 2026

konard mentioned this pull request Apr 22, 2026

docs(audit): add post-audit work report — summary of all 23 findings and fixes (#300) #301

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(autonomous): persist PolicyEngine state across pause/resume (AUDIT-C3)#257

fix(autonomous): persist PolicyEngine state across pause/resume (AUDIT-C3)#257
xlabtg merged 3 commits intoxlabtg:mainfrom
konard:issue-256-dbce37a6ca3e

konard commented Apr 22, 2026 •

edited

Loading

Uh oh!

konard commented Apr 22, 2026

Uh oh!

konard commented Apr 22, 2026

Uh oh!

konard commented Apr 22, 2026

Uh oh!

konard commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

konard commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Acceptance criteria coverage (from the issue)

Reproduction

Test plan

Uh oh!

konard commented Apr 22, 2026

Solution summary

Summary

Uh oh!

konard commented Apr 22, 2026

🤖 Solution Draft Log

💰 Cost: $7.929922

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (3010KB)

Uh oh!

konard commented Apr 22, 2026

🔄 Auto-restart triggered (iteration 1)

Uh oh!

konard commented Apr 22, 2026

🔄 Auto-restart-until-mergeable Log (iteration 1)

💰 Cost: $3.157243

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (4774KB)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

konard commented Apr 22, 2026 •

edited

Loading