feat(agents): durable tool execution + transactional resume [#2835]#2841
Merged
viniciusdacal merged 13 commits intomainfrom Apr 19, 2026
Merged
feat(agents): durable tool execution + transactional resume [#2835]#2841viniciusdacal merged 13 commits intomainfrom
viniciusdacal merged 13 commits intomainfrom
Conversation
] Rev 2 of the design for @vertz/agents durable tool execution and transactional resume. Approved via three agent reviews (DX / Product / Technical) + human sign-off on 2026-04-19. Implementation broken into five phase files; Phases 1-3 are the shippable MVP that closes the P0 correctness hole for side-effecting tool calls (no double-fire on resume after a crash between write phases). Key design decisions locked: - Activation is implicit: store + sessionId + non-memory store → durable. No flag. - Tool opt-in named `safeToRetry` (not `idempotent`) to avoid the Stripe-idempotency-key semantic collision; default is side-effecting. - Durability primitive is a single `AgentStore.appendMessagesAtomic()` method; two atomic writes per step (pre-dispatch / post-dispatch). No `toolCallStatus` field — orphan sentinel is message history alone. - Memory store under durable execution throws `MemoryStoreNotDurableError` at run() entry, not lazily. - Deletes `checkpointInterval` + `onCheckpoint` pre-v1, no shim. Related: #2834 (Anthropic adapter — merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e guard [#2835] Phase 1 Task 1 of the durable resume feature. Adds the durability primitive to the AgentStore contract so Phase 2 can rely on per-step atomic writes, and introduces MemoryStoreNotDurableError plus an isMemoryStore() brand so run() can fail fast when memoryStore + sessionId are combined. - Extend AgentStore with appendMessagesAtomic(sessionId, messages, session). Implementations must run as one driver-level transaction over already-resolved data (no awaits between statements). - memoryStore().appendMessagesAtomic() always throws MemoryStoreNotDurableError — memory is in-process, cannot provide the guarantee. - sqlite-store + d1-store get stubbed throws; real implementations land in Phase 1 Tasks 2 & 3. The two no-throw-plain-error lint warnings are transient and disappear in the follow-up commits. - Export MemoryStoreNotDurableError from @vertz/agents public barrel. - Export MEMORY_STORE_KIND + isMemoryStore() as module-level helpers for run.ts to consume in Phase 1 Task 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Task 2. Real implementation of the durability primitive for the
SQLite store. Session upsert + all message inserts run inside a single
db.transaction(() => { ... }) callable, over already-resolved data — no
await inside the transaction (the @vertz/sqlite driver is sync). If any
statement throws, the whole transaction rolls back, so readers never see
partial state.
Covered by three tests:
- happy path: session + messages visible after one call
- rollback: a circular-reference toolCalls payload fails JSON.stringify
mid-batch; no messages land and session.updatedAt is unchanged
- monotonic seq: successive calls continue the sequence numbering
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Task 3. Real implementation of the durability primitive for the Cloudflare D1 store. A single db.batch([...]) wraps the session upsert + every message INSERT in one implicit transaction. Each INSERT derives its seq from a subquery (COALESCE(MAX(seq), 0) + 1) so no pre-batch SELECT is required — D1 statements in the same batch see each other's writes, which gives the INSERTs monotonically increasing seq values. Because D1 batch() is documented as implicitly transactional (https://developers.cloudflare.com/d1/worker-api/prepared-statements/#batch-statements), the whole batch commits atomically or rolls back on any statement failure. Covered by four tests: - happy path: session + messages visible after one call - monotonic seq across two successive atomic appends - rollback: a failing batch() rejects, no partial state visible - toolCall metadata (toolCallId, toolName, toolCalls) round-trips Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Task 4. Fail fast at run() entry — before any LLM call or store
access — whenever a sessionId is paired with memoryStore(). The memory
store cannot provide the durable per-step writes that resume requires,
so silently allowing it would lose data on restart (especially for
chat-only agents that never call a tool and never exercise
appendMessagesAtomic otherwise).
- run() at the top of the hasStore branch checks isMemoryStore(store)
when sessionId is present and throws MemoryStoreNotDurableError
synchronously.
- Added three tests covering: tool-calling agent throws before LLM; chat-
only agent throws (no silent loss); memoryStore without sessionId still
works normally.
- Migrated 12 existing run.test.ts uses of memoryStore() + sessionId to
sqliteStore({ path: ':memory:' }). Equivalent in-process behavior,
transactional by construction.
- Same migration in create-agent-runner.test.ts (3 uses).
- types.test-d.ts left untouched — its memoryStore() call doesn't pass a
sessionId and is a pure type check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#2835] Phase 1 Task 5. Adds the test-only subpath export '@vertz/agents/testing' and its first helper, crashAfterToolResults(store, failOnCallNumber = 2), used by durable-resume.test.ts to simulate a crash between the pre- dispatch write and the post-dispatch write. The helper wraps any AgentStore, counts appendMessagesAtomic calls, and throws a sentinel Error on the Nth call; all other methods pass through unchanged. - src/testing/crash-harness.ts: the factory. - src/testing/index.ts: barrel for the subpath. - src/testing/crash-harness.test.ts: 4 tests covering pass-through behavior, Nth-call fail, and delegation of non-atomic methods. - package.json exports: ./testing entry with dist/testing/index.{js,d.ts}. - build.config.ts: add src/testing/index.ts as an entry so dts runs. Verified: `dist/testing/index.js` and `dist/testing/index.d.ts` produced by the build. Lint clean (the sentinel plain-Error throw has an inline disable comment with rationale). 227/227 agent tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Task 6. Lands the feature-level E2E test that drives Phases 2 and 3 to GREEN, plus a perf regression gate for the durable-resume write pattern. __tests__/durable-resume.test.ts — the MVP contract. Three cases: 1. "Does not re-invoke handler" (THE key assertion). Scripted LLM asks for postSlack; crash harness throws on the 2nd appendMessagesAtomic call (simulating a crash AFTER the handler dispatched). Resume with the same sessionId must NOT re-invoke the handler; the stored message history must include a ToolDurabilityError tool_result. --- Currently RED: Phase 1 doesn't call appendMessagesAtomic in the loop so the crash never fires. Phase 2 wires the atomic writes (still RED because no resume logic). Phase 3 surfaces the error → GREEN. The feature branch carries the intermediate RED commits by design; the final PR to main is green. .claude/rules/tdd.md forbids .skip, so the test stays un-skipped. 2. "memoryStore + sessionId throws at entry" — GREEN already (Task 4 landed that path). Doubles as an integration assertion that the guard fires before any LLM call. 3. Type-level @ts-expect-error on `safeToRetry: true` — GREEN until Phase 4 adds the field. If Phase 4's type wiring is missed, this directive fires "unused" and alerts. __tests__/durable-resume.perf.test.ts — gates a 10-step scripted loop on sqliteStore(:memory:) under 200ms. Current measurement: ~4ms. If Phase 2's per-step atomic writes regress this badly, CI alarms. Scope boundary: the current run() requires the session row to exist before run() with a fixed sessionId — the test pre-seeds it via saveSession(). This matches the DO walkthrough pattern and is not a framework change in scope for Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of durable tool execution. Wires appendMessagesAtomic into the loop so tool-calling steps commit durably in two atomic writes per step (assistant-with-toolCalls, then tool_results), and deletes the obsolete checkpointInterval + onCheckpoint pair. reactLoop: - Remove checkpointInterval + onCheckpoint from ReactLoopOptions (pre-v1, no shim). - Add persistStep callback: fires with phase='assistant-with-tool-calls' after the assistant message is pushed, then with phase='tool-results' after all result messages for the step are pushed. If persistStep rejects, the error propagates out. run.ts: - Detect durable mode: store + sessionId + !isMemoryStore. - When durable, provide persistStep that calls appendMessagesAtomic with a fresh session snapshot each write. The new user message is bundled into the FIRST persistStep call (so a crash before any step leaves no partial state — nothing persists until work begins). - Skip the end-of-run saveSession + appendMessages pair when durable; flush any trailing messages (e.g. text-only final assistant) via one last atomic call. - Non-durable path (stateless or no sessionId) unchanged. Config: - Delete checkpointInterval from AgentLoopConfig + agent() defaults + all tests that referenced the obsolete callback. The types.test-d.ts regression guard now asserts that checkpointInterval is rejected. E2E test update: - durable-resume.test.ts is still RED on the single "ToolDurabilityError surfaces in history" assertion; Phase 3 adds the resume logic that writes the synthetic error tool_result. Everything else in the file passes — handler runs once pre-crash, crash trips correctly on write #2, second run() completes cleanly (scripted LLM says "Done" rather than re-requesting the tool). 232 pass, 1 fail (the planned Phase-3-target assertion), 0 lint/type regressions. The 8 pre-existing no-throw-plain-error warnings in react-loop.ts / react-loop.test.ts / agent.ts are unchanged by this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of durable tool execution — the MVP completes here. When run() resumes a session whose last assistant message has unmatched tool_call ids (the crash signature for a side-effecting tool that ran but whose tool_result was lost before write #2 committed), the framework now: 1. Constructs a ToolDurabilityError per missing tool_call. 2. Serializes each as a `tool` role message (same JSON shape the loop already uses for handler errors, plus a `kind: 'tool-durability-error'` discriminator so callers + LLMs can pattern-match). 3. Commits the synthetic tool_results atomically via appendMessagesAtomic before the loop's first LLM call. 4. Extends previousMessages with those synthetic rows so the LLM sees the error in-band and decides recovery (check external state, ask user, abort — its call). Also adds `findOrphanAssistantWithToolCalls()` — a pure message-history scan, no new schema column required. The sentinel is "assistant with toolCalls + no matching tool_result" per the design's crash taxonomy. Export: - ToolDurabilityError class from @vertz/agents barrel (so callers inspecting resumed history can pattern-match). Tests: - errors.test.ts: class shape + serialized encoding. - durable-resume.test.ts: the main E2E is now fully GREEN. Handler runs exactly once across crash + resume; ToolDurabilityError surfaces in history with the correct toolName/toolCallId. Total: 236/236 tests pass. Typecheck clean. Lint clean on Phase 3 additions. Phases 1–3 = MVP. Phase 4 adds the safeToRetry opt-in; Phase 5 ships docs + changeset + PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of durable tool execution. Pure-read tools or handlers that are safe to run twice can now declare `safeToRetry: true` to opt out of the conservative ToolDurabilityError path. On resume with a missing tool_result, the framework re-invokes those handlers and persists the real result instead of asking the LLM to decide recovery. Types: - ToolConfig.safeToRetry?: boolean — public, documented with explicit callout that this is about resume replay, NOT HTTP retry. - ToolDefinition.safeToRetry?: boolean — forwarded by tool(). run.ts (resume dispatch): - Orphan handling moves from the session-load branch to after ctx/agents/resolvedTools are built, so executeToolCall can run with a real ToolContext. - Per missing tool_call: if resolvedTools[name]?.safeToRetry, call executeToolCall — which handles input validation + output validation + handler errors identically to the loop. Otherwise surface the ToolDurabilityError as before. - Re-invocations + durability-error messages persist atomically in a single appendMessagesAtomic call (batch per resume, not per tool). react-loop.ts: - Export executeToolCall + ToolCallResult so run.ts can reuse the same code path (tool-not-found / no-handler / input/output validation / handler errors all encoded the same way). Tests: - tool.test.ts: safeToRetry forwards correctly (true → true, omitted → undefined). - durable-resume.test.ts: new E2E scenario — a safeToRetry tool crashes mid-step, handler re-invokes on resume, real result lands, NO ToolDurabilityError appears in history. Handler count goes 1 → 2 — exactly the point. - Updated the type-level test: safeToRetry: true compiles (negative was removed), safeToRetry: 'yes' is rejected. 240/240 tests pass. 2 pre-existing no-throw-plain-error warnings unchanged by this commit. Phases 1–4 deliver the full feature. Phase 5 wraps docs + changeset + retrospective + PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5 — the release wrap for durable tool execution. - packages/mint-docs/guides/agents/durable-resume.mdx: a full user-facing guide covering activation (store + sessionId on durable store), the safeToRetry flag with an explicit "this is NOT network retry" callout, the crash-window taxonomy table in user terms, cost guidance (~2 writes/step on D1), and the ToolDurabilityError inspection pattern for resumed history. - docs.json nav updated to include the new page. - .changeset/agents-durable-resume.md: patch bump with the full public-surface diff (new + removed). - plans/post-implementation-reviews/agents-durable-resume.md: retro covering what shipped, what worked, what didn't, manual-verification checklist for the CF DO staging run, and open follow-ups. The manual CF DO verification is release-gating, not merge-gating — the framework tests all pass; the retro captures the checklist to run post-merge against triagebot staging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure oxfmt pass on the new durable-resume guide to unblock CI. No content change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e883e53 to
3902d93
Compare
…drift) These two files are identical to origin/main but fail `oxfmt --check`, blocking CI on this PR. Pure whitespace/wrap normalization — no content change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2835. Ships durable tool execution for
@vertz/agents— at-most-once handler execution across crashes, with automatic resume whenrun()is called withstore + sessionIdagainst a durable store. Production triagebot path is now unblocked (#2834Anthropic adapter + this feature).Public API Changes
Added (patch bump):
AgentStore.appendMessagesAtomic(sessionId, messages, session)— the durability primitive. Implemented on memory (throw), SQLite (db.transaction), and D1 (db.batch).MemoryStoreNotDurableError— thrown atrun()entry whenmemoryStore()is paired withsessionId. Catches chat-only agents that would silently lose data.ToolDurabilityError— surfaced as a tool_result when an orphaned non-safeToRetrycall is detected on resume. Exported from the main barrel so callers can pattern-match.tool({ safeToRetry: true, ... })— optional per-tool flag. Whentrue, the framework may re-invoke the handler on resume. Default assumes side effects.@vertz/agents/testingsubpath —crashAfterToolResults(store, N)harness for writing resume tests.Removed (pre-v1, no shim):
AgentLoopConfig.checkpointInterval— was a notification callback, not durability.ReactLoopOptions.onCheckpoint— same.Design
Three agent sign-offs (DX, Product, Technical) + human sign-off on the Rev 2 design doc before implementation. See:
plans/agents-durable-resume.mdplans/agents-durable-resume/phase-01..05.mdplans/post-implementation-reviews/agents-durable-resume.mdPhases summary
appendMessagesAtomicinterface + 3 impls,MemoryStoreNotDurableErroratrun()entry,@vertz/agents/testingsubpath with crash harness, E2E test as TDD RED + perf gate (~4ms for 10-step in-memory SQLite loop, well under 200ms budget).persistStepcallback. End-of-run flush kept only for the non-durable path.checkpointIntervaldeleted across the package.ToolDurabilityErrortool_result. E2E flips GREEN. MVP complete.tool({ safeToRetry })opt-in. Resume re-invokes handlers declared safe; falls back to the error for the rest.packages/mint-docs/, changeset, retrospective with CF DO manual-verification checklist.E2E acceptance test status
Both scenarios green at
packages/agents/src/__tests__/durable-resume.test.ts:ToolDurabilityErrorin resumed history.safeToRetry: truetool + crash → handler re-invokes on resume, real result persisted.Plus memory-store guard fires at
run()entry (before any LLM call) for both tool-calling and chat-only agents.Test plan
vtz testinpackages/agents— 243 pass, 13 skipped (unrelated to this PR).vtz run typecheckinpackages/agents— clean.durable-resume.perf.test.ts) — ~4ms, < 200ms budget.Breaking changes
Pre-v1, so technically none external. For existing callers:
checkpointInterval/onCheckpointremoved. Any caller using them must migrate — resume is the replacement if that's what they wanted.memoryStore() + sessionIdnow throws at entry. Callers wanting in-process session continuity should usesqliteStore({ path: ':memory:' }); those wanting stateless should dropsessionId. All affected tests in this repo were migrated in this PR.🤖 Generated with Claude Code