Skip duplicate ids in batchCreateMemories instead of aborting#64
Merged
Skip duplicate ids in batchCreateMemories instead of aborting#64
Conversation
A single duplicate id within a batch — or a retry that includes ids
that already landed — currently aborts the surrounding transaction
and rolls back every sibling row. The visible symptom is the importer
losing whole sessions to a single rogue dup, even when only one of
the planned messages actually conflicts.
Switch the batch insert to ON CONFLICT (id) DO NOTHING and skip the
returned-row check when no row comes back. The returned ids array
may now be shorter than params when conflicts occur; the only
in-tree caller (importer batchCreate) doesn't read it, and the new
docstring spells out the semantic for future callers.
This is safe because typical callers either:
(a) supply deterministic ids (the importer derives UUIDv7 from a
stable hash), in which case a conflict means 'we already have
this row,' or
(b) let the column default produce fresh UUIDv7s, in which case
conflicts are statistically impossible.
Two integration tests cover the dup-within-batch and existing-row
re-insert cases.
Post-#64, engine.memory.batchCreate silently drops conflicting ids, so the previous 'Installed N memories' line could under-count without the user noticing — a re-install of the same version would report 0 installed, and a real id collision with a non-pack memory would be invisible. Classify each skipped id as either: - idempotent: already tagged with this pack name+version (benign) - conflict: held by something else — different pack, different version, or a non-pack memory (suspicious) Conflicts are surfaced via clack.log.warn with the offending ids and a 'me memory get' hint. Exit code stays 0; --strict is left for a future addition. Dry-run now predicts wouldSkipIdempotent from same-version rows in the step-3 search so the preview matches the real install for the common cases. Conflicts with non-pack memories aren't predictable without extra lookups and are documented as such. JSON output adds skipped / skippedIdempotent / skippedConflict (and skippedConflictIds when non-empty); existing fields are unchanged. The classifier is exported as a pure function and covered by 9 unit tests in pack.test.ts.
Apply the post-#64 skip-surfacing pattern from `me pack install` to the two remaining batchCreate callers. Memory import has no shared metadata to classify skips against (unlike pack install's pack name+version), so this collapses to a single 'skipped' bucket plus the list of conflicting ids. The user/agent can `me memory get <id>` to investigate any specific skip. CLI `me memory import`: - New JSON fields: `skipped` (count), `skippedIds` (array). Existing `imported`, `failed`, `ids`, `errors` unchanged. - Text output gains three branches: clean (unchanged), partial skip ('Imported 3 memories (2 skipped — id already exists)'), and all-skipped ('Imported 0 memories (N already exist, no changes)'). - `--verbose` lists each skipped id inline as '⊝ <id>'. - Exit code unchanged: skipped does not contribute to non-zero exit, matching the pack install policy. - Dry-run unchanged — predicting skips would require N extra `memory.get` lookups; documented as a limitation. MCP `me_memory_import`: - Response now includes `skipped` and `skippedIds` (always present, possibly empty) alongside the existing `imported` and `ids`. - `idempotentHint` flipped from false to true — server-side `ON CONFLICT DO NOTHING` makes repeat calls land the engine in the same state. Pure helper `computeSkippedIds(explicitIds, insertedIds)` is exported and covered by 6 unit tests in memory-import.test.ts. Auto-id memories are correctly excluded from the skip count (server-generated UUIDv7s never collide). Docs updated for both surfaces.
jgpruitt
added a commit
that referenced
this pull request
May 1, 2026
Post-#64, engine.memory.batchCreate silently drops conflicting ids, so the previous 'Installed N memories' line could under-count without the user noticing — a re-install of the same version would report 0 installed, and a real id collision with a non-pack memory would be invisible. Classify each skipped id as either: - idempotent: already tagged with this pack name+version (benign) - conflict: held by something else — different pack, different version, or a non-pack memory (suspicious) Conflicts are surfaced via clack.log.warn with the offending ids and a 'me memory get' hint. Exit code stays 0; --strict is left for a future addition. Dry-run now predicts wouldSkipIdempotent from same-version rows in the step-3 search so the preview matches the real install for the common cases. Conflicts with non-pack memories aren't predictable without extra lookups and are documented as such. JSON output adds skipped / skippedIdempotent / skippedConflict (and skippedConflictIds when non-empty); existing fields are unchanged. The classifier is exported as a pure function and covered by 9 unit tests in pack.test.ts.
jgpruitt
added a commit
that referenced
this pull request
May 1, 2026
The chunk loop + per-chunk try/catch + accumulator pattern is about to
appear in three more places (me memory import, the MCP me_memory_import
tool, me pack install). Extract the orchestration into a reusable
helper so all four call sites share one source of truth.
batchCreateChunked(client, memories):
- Iterates chunkMemoriesForBatchCreate(memories) sequentially.
- Catches per-chunk failures, records them in `errors`, accumulates
each failed chunk's explicit ids in `failedIds` (and exposes them
per-error via `errors[].ids` for callers that need attribution).
- Returns { insertedIds, failedIds, errors }.
- Structurally typed BatchCreateClient so tests can pass a stub.
Refactored writeSession to use it. Behavior preserved: each failed
chunk contributes chunk.length to outcome.failed, and each id in the
failed chunk gets a row in outcome.errors with the chunk's error
message — matching the previous per-message error attribution.
Six new unit tests for batchCreateChunked covering single-chunk,
two-chunk accumulation, partial failure, total failure, post-#64
shorter-than-input ids, and empty input.
jgpruitt
added a commit
that referenced
this pull request
May 1, 2026
The chunk loop + per-chunk try/catch + accumulator pattern is about to
appear in three more places (me memory import, the MCP me_memory_import
tool, me pack install). Extract the orchestration into a reusable
helper so all four call sites share one source of truth.
batchCreateChunked(client, memories):
- Iterates chunkMemoriesForBatchCreate(memories) sequentially.
- Catches per-chunk failures, records them in `errors`, accumulates
each failed chunk's explicit ids in `failedIds` (and exposes them
per-error via `errors[].ids` for callers that need attribution).
- Returns { insertedIds, failedIds, errors }.
- Structurally typed BatchCreateClient so tests can pass a stub.
Refactored writeSession to use it. Behavior preserved: each failed
chunk contributes chunk.length to outcome.failed, and each id in the
failed chunk gets a row in outcome.errors with the chunk's error
message — matching the previous per-message error attribution.
Six new unit tests for batchCreateChunked covering single-chunk,
two-chunk accumulation, partial failure, total failure, post-#64
shorter-than-input ids, and empty input.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A single duplicate id within a batch — or a retry that includes ids that already landed — currently aborts the surrounding transaction in
batchCreateMemoriesand rolls back every sibling row. The visible symptom is the importer losing whole sessions to a single rogue dup, even when only one of the planned messages actually conflicts.Companion to #63 (importer-side dedup): that fix removes the typical cause, this one makes the contract robust regardless of caller.
Fix
Switch the batch insert to
ON CONFLICT (id) DO NOTHINGand skip the returned-row check when no row comes back. The returnedids[]may now be shorter thanparams[]when conflicts occur; the only in-tree caller (the importer'sbatchCreate) doesn't read it, and the new docstring spells out the semantic for future callers. Since these are potentially due ot the UUIDs matching more than one conversation, it could also mean maybe we should handle IDs differently, and alternately, if there are duplicates we might want to do an upsert instead of do nothing?This is safe because typical callers either:
Test plan