Skip duplicate ids in batchCreateMemories instead of aborting by graveland · Pull Request #64 · timescale/memory-engine

graveland · 2026-04-30T23:09:17Z

Summary

A single duplicate id within a batch — or a retry that includes ids that already landed — currently aborts the surrounding transaction in batchCreateMemories and rolls back every sibling row. The visible symptom is the importer losing whole sessions to a single rogue dup, even when only one of the planned messages actually conflicts.

Companion to #63 (importer-side dedup): that fix removes the typical cause, this one makes the contract robust regardless of caller.

Fix

Switch the batch insert to ON CONFLICT (id) DO NOTHING and skip the returned-row check when no row comes back. The returned ids[] may now be shorter than params[] when conflicts occur; the only in-tree caller (the importer's batchCreate) doesn't read it, and the new docstring spells out the semantic for future callers. Since these are potentially due ot the UUIDs matching more than one conversation, it could also mean maybe we should handle IDs differently, and alternately, if there are duplicates we might want to do an upsert instead of do nothing?

This is safe because typical callers either:

supply deterministic ids (e.g. the importer derives UUIDv7 from a stable hash), in which case a conflict means "we already have this row," or
let the column default produce fresh UUIDv7s, in which case conflicts are statistically impossible.

Test plan

2 new integration tests in `packages/engine/db.integration.test.ts` covering dup-within-batch and existing-row re-insert
`./bun run typecheck` clean
`./bun run lint` clean
All 55 engine integration tests pass against PG 18 with pgvector + pg_textsearch

A single duplicate id within a batch — or a retry that includes ids that already landed — currently aborts the surrounding transaction and rolls back every sibling row. The visible symptom is the importer losing whole sessions to a single rogue dup, even when only one of the planned messages actually conflicts. Switch the batch insert to ON CONFLICT (id) DO NOTHING and skip the returned-row check when no row comes back. The returned ids array may now be shorter than params when conflicts occur; the only in-tree caller (importer batchCreate) doesn't read it, and the new docstring spells out the semantic for future callers. This is safe because typical callers either: (a) supply deterministic ids (the importer derives UUIDv7 from a stable hash), in which case a conflict means 'we already have this row,' or (b) let the column default produce fresh UUIDv7s, in which case conflicts are statistically impossible. Two integration tests cover the dup-within-batch and existing-row re-insert cases.

Post-#64, engine.memory.batchCreate silently drops conflicting ids, so the previous 'Installed N memories' line could under-count without the user noticing — a re-install of the same version would report 0 installed, and a real id collision with a non-pack memory would be invisible. Classify each skipped id as either: - idempotent: already tagged with this pack name+version (benign) - conflict: held by something else — different pack, different version, or a non-pack memory (suspicious) Conflicts are surfaced via clack.log.warn with the offending ids and a 'me memory get' hint. Exit code stays 0; --strict is left for a future addition. Dry-run now predicts wouldSkipIdempotent from same-version rows in the step-3 search so the preview matches the real install for the common cases. Conflicts with non-pack memories aren't predictable without extra lookups and are documented as such. JSON output adds skipped / skippedIdempotent / skippedConflict (and skippedConflictIds when non-empty); existing fields are unchanged. The classifier is exported as a pure function and covered by 9 unit tests in pack.test.ts.

Apply the post-#64 skip-surfacing pattern from `me pack install` to the two remaining batchCreate callers. Memory import has no shared metadata to classify skips against (unlike pack install's pack name+version), so this collapses to a single 'skipped' bucket plus the list of conflicting ids. The user/agent can `me memory get <id>` to investigate any specific skip. CLI `me memory import`: - New JSON fields: `skipped` (count), `skippedIds` (array). Existing `imported`, `failed`, `ids`, `errors` unchanged. - Text output gains three branches: clean (unchanged), partial skip ('Imported 3 memories (2 skipped — id already exists)'), and all-skipped ('Imported 0 memories (N already exist, no changes)'). - `--verbose` lists each skipped id inline as '⊝ <id>'. - Exit code unchanged: skipped does not contribute to non-zero exit, matching the pack install policy. - Dry-run unchanged — predicting skips would require N extra `memory.get` lookups; documented as a limitation. MCP `me_memory_import`: - Response now includes `skipped` and `skippedIds` (always present, possibly empty) alongside the existing `imported` and `ids`. - `idempotentHint` flipped from false to true — server-side `ON CONFLICT DO NOTHING` makes repeat calls land the engine in the same state. Pure helper `computeSkippedIds(explicitIds, insertedIds)` is exported and covered by 6 unit tests in memory-import.test.ts. Auto-id memories are correctly excluded from the skip count (server-generated UUIDv7s never collide). Docs updated for both surfaces.

Post-#64, engine.memory.batchCreate silently drops conflicting ids, so the previous 'Installed N memories' line could under-count without the user noticing — a re-install of the same version would report 0 installed, and a real id collision with a non-pack memory would be invisible. Classify each skipped id as either: - idempotent: already tagged with this pack name+version (benign) - conflict: held by something else — different pack, different version, or a non-pack memory (suspicious) Conflicts are surfaced via clack.log.warn with the offending ids and a 'me memory get' hint. Exit code stays 0; --strict is left for a future addition. Dry-run now predicts wouldSkipIdempotent from same-version rows in the step-3 search so the preview matches the real install for the common cases. Conflicts with non-pack memories aren't predictable without extra lookups and are documented as such. JSON output adds skipped / skippedIdempotent / skippedConflict (and skippedConflictIds when non-empty); existing fields are unchanged. The classifier is exported as a pure function and covered by 9 unit tests in pack.test.ts.

The chunk loop + per-chunk try/catch + accumulator pattern is about to appear in three more places (me memory import, the MCP me_memory_import tool, me pack install). Extract the orchestration into a reusable helper so all four call sites share one source of truth. batchCreateChunked(client, memories): - Iterates chunkMemoriesForBatchCreate(memories) sequentially. - Catches per-chunk failures, records them in `errors`, accumulates each failed chunk's explicit ids in `failedIds` (and exposes them per-error via `errors[].ids` for callers that need attribution). - Returns { insertedIds, failedIds, errors }. - Structurally typed BatchCreateClient so tests can pass a stub. Refactored writeSession to use it. Behavior preserved: each failed chunk contributes chunk.length to outcome.failed, and each id in the failed chunk gets a row in outcome.errors with the chunk's error message — matching the previous per-message error attribution. Six new unit tests for batchCreateChunked covering single-chunk, two-chunk accumulation, partial failure, total failure, post-#64 shorter-than-input ids, and empty input.

graveland marked this pull request as ready for review May 1, 2026 02:19

graveland requested a review from jgpruitt as a code owner May 1, 2026 02:19

jgpruitt requested a review from murrayju May 1, 2026 13:07

jgpruitt added 2 commits May 1, 2026 09:29

jgpruitt merged commit 4bdaa0a into main May 1, 2026
3 checks passed

jgpruitt deleted the brent/fix-batchcreate-on-conflict-do-nothing branch May 1, 2026 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip duplicate ids in batchCreateMemories instead of aborting#64

Skip duplicate ids in batchCreateMemories instead of aborting#64
jgpruitt merged 3 commits intomainfrom
brent/fix-batchcreate-on-conflict-do-nothing

graveland commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

graveland commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

graveland commented Apr 30, 2026 •

edited

Loading