Skip to content

Skip duplicate ids in batchCreateMemories instead of aborting#64

Merged
jgpruitt merged 3 commits intomainfrom
brent/fix-batchcreate-on-conflict-do-nothing
May 1, 2026
Merged

Skip duplicate ids in batchCreateMemories instead of aborting#64
jgpruitt merged 3 commits intomainfrom
brent/fix-batchcreate-on-conflict-do-nothing

Conversation

@graveland
Copy link
Copy Markdown
Contributor

@graveland graveland commented Apr 30, 2026

Summary

A single duplicate id within a batch — or a retry that includes ids that already landed — currently aborts the surrounding transaction in batchCreateMemories and rolls back every sibling row. The visible symptom is the importer losing whole sessions to a single rogue dup, even when only one of the planned messages actually conflicts.

Companion to #63 (importer-side dedup): that fix removes the typical cause, this one makes the contract robust regardless of caller.

Fix

Switch the batch insert to ON CONFLICT (id) DO NOTHING and skip the returned-row check when no row comes back. The returned ids[] may now be shorter than params[] when conflicts occur; the only in-tree caller (the importer's batchCreate) doesn't read it, and the new docstring spells out the semantic for future callers. Since these are potentially due ot the UUIDs matching more than one conversation, it could also mean maybe we should handle IDs differently, and alternately, if there are duplicates we might want to do an upsert instead of do nothing?

This is safe because typical callers either:

  • supply deterministic ids (e.g. the importer derives UUIDv7 from a stable hash), in which case a conflict means "we already have this row," or
  • let the column default produce fresh UUIDv7s, in which case conflicts are statistically impossible.

Test plan

  • 2 new integration tests in `packages/engine/db.integration.test.ts` covering dup-within-batch and existing-row re-insert
  • `./bun run typecheck` clean
  • `./bun run lint` clean
  • All 55 engine integration tests pass against PG 18 with pgvector + pg_textsearch

A single duplicate id within a batch — or a retry that includes ids
that already landed — currently aborts the surrounding transaction
and rolls back every sibling row. The visible symptom is the importer
losing whole sessions to a single rogue dup, even when only one of
the planned messages actually conflicts.

Switch the batch insert to ON CONFLICT (id) DO NOTHING and skip the
returned-row check when no row comes back. The returned ids array
may now be shorter than params when conflicts occur; the only
in-tree caller (importer batchCreate) doesn't read it, and the new
docstring spells out the semantic for future callers.

This is safe because typical callers either:
  (a) supply deterministic ids (the importer derives UUIDv7 from a
      stable hash), in which case a conflict means 'we already have
      this row,' or
  (b) let the column default produce fresh UUIDv7s, in which case
      conflicts are statistically impossible.

Two integration tests cover the dup-within-batch and existing-row
re-insert cases.
@graveland graveland marked this pull request as ready for review May 1, 2026 02:19
@graveland graveland requested a review from jgpruitt as a code owner May 1, 2026 02:19
@jgpruitt jgpruitt requested a review from murrayju May 1, 2026 13:07
jgpruitt added 2 commits May 1, 2026 09:29
Post-#64, engine.memory.batchCreate silently drops conflicting ids,
so the previous 'Installed N memories' line could under-count
without the user noticing — a re-install of the same version would
report 0 installed, and a real id collision with a non-pack memory
would be invisible.

Classify each skipped id as either:
  - idempotent: already tagged with this pack name+version (benign)
  - conflict:   held by something else — different pack, different
                version, or a non-pack memory (suspicious)

Conflicts are surfaced via clack.log.warn with the offending ids
and a 'me memory get' hint. Exit code stays 0; --strict is left for
a future addition.

Dry-run now predicts wouldSkipIdempotent from same-version rows in
the step-3 search so the preview matches the real install for the
common cases. Conflicts with non-pack memories aren't predictable
without extra lookups and are documented as such.

JSON output adds skipped / skippedIdempotent / skippedConflict (and
skippedConflictIds when non-empty); existing fields are unchanged.

The classifier is exported as a pure function and covered by 9 unit
tests in pack.test.ts.
Apply the post-#64 skip-surfacing pattern from `me pack install` to
the two remaining batchCreate callers. Memory import has no shared
metadata to classify skips against (unlike pack install's pack
name+version), so this collapses to a single 'skipped' bucket plus
the list of conflicting ids. The user/agent can `me memory get
<id>` to investigate any specific skip.

CLI `me memory import`:
- New JSON fields: `skipped` (count), `skippedIds` (array). Existing
  `imported`, `failed`, `ids`, `errors` unchanged.
- Text output gains three branches: clean (unchanged), partial skip
  ('Imported 3 memories (2 skipped — id already exists)'), and
  all-skipped ('Imported 0 memories (N already exist, no changes)').
- `--verbose` lists each skipped id inline as '⊝ <id>'.
- Exit code unchanged: skipped does not contribute to non-zero exit,
  matching the pack install policy.
- Dry-run unchanged — predicting skips would require N extra
  `memory.get` lookups; documented as a limitation.

MCP `me_memory_import`:
- Response now includes `skipped` and `skippedIds` (always present,
  possibly empty) alongside the existing `imported` and `ids`.
- `idempotentHint` flipped from false to true — server-side
  `ON CONFLICT DO NOTHING` makes repeat calls land the engine in
  the same state.

Pure helper `computeSkippedIds(explicitIds, insertedIds)` is
exported and covered by 6 unit tests in memory-import.test.ts.
Auto-id memories are correctly excluded from the skip count
(server-generated UUIDv7s never collide).

Docs updated for both surfaces.
@jgpruitt jgpruitt merged commit 4bdaa0a into main May 1, 2026
3 checks passed
jgpruitt added a commit that referenced this pull request May 1, 2026
Post-#64, engine.memory.batchCreate silently drops conflicting ids,
so the previous 'Installed N memories' line could under-count
without the user noticing — a re-install of the same version would
report 0 installed, and a real id collision with a non-pack memory
would be invisible.

Classify each skipped id as either:
  - idempotent: already tagged with this pack name+version (benign)
  - conflict:   held by something else — different pack, different
                version, or a non-pack memory (suspicious)

Conflicts are surfaced via clack.log.warn with the offending ids
and a 'me memory get' hint. Exit code stays 0; --strict is left for
a future addition.

Dry-run now predicts wouldSkipIdempotent from same-version rows in
the step-3 search so the preview matches the real install for the
common cases. Conflicts with non-pack memories aren't predictable
without extra lookups and are documented as such.

JSON output adds skipped / skippedIdempotent / skippedConflict (and
skippedConflictIds when non-empty); existing fields are unchanged.

The classifier is exported as a pure function and covered by 9 unit
tests in pack.test.ts.
@jgpruitt jgpruitt deleted the brent/fix-batchcreate-on-conflict-do-nothing branch May 1, 2026 14:43
jgpruitt added a commit that referenced this pull request May 1, 2026
The chunk loop + per-chunk try/catch + accumulator pattern is about to
appear in three more places (me memory import, the MCP me_memory_import
tool, me pack install). Extract the orchestration into a reusable
helper so all four call sites share one source of truth.

batchCreateChunked(client, memories):
  - Iterates chunkMemoriesForBatchCreate(memories) sequentially.
  - Catches per-chunk failures, records them in `errors`, accumulates
    each failed chunk's explicit ids in `failedIds` (and exposes them
    per-error via `errors[].ids` for callers that need attribution).
  - Returns { insertedIds, failedIds, errors }.
  - Structurally typed BatchCreateClient so tests can pass a stub.

Refactored writeSession to use it. Behavior preserved: each failed
chunk contributes chunk.length to outcome.failed, and each id in the
failed chunk gets a row in outcome.errors with the chunk's error
message — matching the previous per-message error attribution.

Six new unit tests for batchCreateChunked covering single-chunk,
two-chunk accumulation, partial failure, total failure, post-#64
shorter-than-input ids, and empty input.
jgpruitt added a commit that referenced this pull request May 1, 2026
The chunk loop + per-chunk try/catch + accumulator pattern is about to
appear in three more places (me memory import, the MCP me_memory_import
tool, me pack install). Extract the orchestration into a reusable
helper so all four call sites share one source of truth.

batchCreateChunked(client, memories):
  - Iterates chunkMemoriesForBatchCreate(memories) sequentially.
  - Catches per-chunk failures, records them in `errors`, accumulates
    each failed chunk's explicit ids in `failedIds` (and exposes them
    per-error via `errors[].ids` for callers that need attribution).
  - Returns { insertedIds, failedIds, errors }.
  - Structurally typed BatchCreateClient so tests can pass a stub.

Refactored writeSession to use it. Behavior preserved: each failed
chunk contributes chunk.length to outcome.failed, and each id in the
failed chunk gets a row in outcome.errors with the chunk's error
message — matching the previous per-message error attribution.

Six new unit tests for batchCreateChunked covering single-chunk,
two-chunk accumulation, partial failure, total failure, post-#64
shorter-than-input ids, and empty input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants