perf(memory): batch re-embed backfill into one provider request#3392
Conversation
`handle_reembed_backfill` embedded each chunk/summary body with a separate sequential `embedder.embed(&body).await` — N real Voyage round-trips per bounded batch (REEMBED_BACKFILL_BATCH = 16). Collapse those into a single batched provider request. - Add a default `Embedder::embed_batch(&[&str]) -> Vec<Result<Vec<f32>>>` (sequential fallback) plus `embed_batch_via_provider`, which issues one inner-provider `embed(&[..])` call and falls back to per-text `embed_one` on a whole-batch error or a length mismatch so per-position error attribution / tombstoning is preserved. - Override `embed_batch` in `CloudEmbedder` and `OpenAiCompatEmbedder` (both wrap a batch-capable `EmbeddingProvider`); Ollama keeps the sequential default (its memory-tree path has no batch endpoint). - Refactor the two reembed loops into one generic `reembed_collect` (Phase A: read bodies + tombstone read failures; Phase B: one `embed_batch`; Phase C: classify per position exactly as before). All tinyhumansai#1574 §6 failure/tombstone semantics and log lines are preserved verbatim — only the embed call shape changes. Tests: 7 new unit tests for the batch helper (happy/empty/fallback-on- error/fallback-on-mismatch/wrong-dim-per-position) + default-trait ordering and per-position error propagation. Existing `handle_reembed_backfill` integration tests (InertEmbedder) cover the refactored path end-to-end. embed 49/49, reembed 10/10 green.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds Embedder::embed_batch, provider-side batch orchestration with per-text fallback and dimension checks, Cloud/OpenAI-compatible embedder batch wiring, tests for batch/fallback behavior, and a reembed_collect helper used by the backfill handler to batch embeds. ChangesBatched Embedding Architecture and Application
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/openhuman/memory_tree/score/embed/mod.rs (1)
78-84: ⚡ Quick winAdd debug-level telemetry around the new batch paths.
These helpers only emit logs once something is already wrong. There is still no debug trace for batch start/success or for the fallback decision, which makes it hard to confirm from logs whether a backfill actually collapsed into one provider call or silently ran per-text. As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."
Also applies to: 112-142
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_tree/score/embed/mod.rs` around lines 78 - 84, The batch helper embed_batch (and the related batch paths around lines 112-142) lack debug telemetry—add debug-level tracing at entry (including number of texts), at the decision point when you choose the per-text fallback vs a real provider batch call, and on successful completion (including total embeddings produced and duration); also log when you call the external provider and any retry/timeouts. Use the existing tracing/logger used in this module (the same logger used by embed/embed_batch) and include stable, grep-friendly prefixes like "embed_batch:enter", "embed_batch:fallback", "embed_batch:provider_call", and "embed_batch:success" so callers can distinguish start/branch/external-call/success in logs.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/memory_queue/handlers/mod.rs`:
- Around line 754-765: The current reembed_backfill path detects an embed_batch
contract violation (results.len() != readable.len()) and returns an empty Vec,
which causes the caller to treat it as a no-op and repeatedly Defer the same
ids; instead, change the branch in reembed_backfill so that it does not return
an empty Vec: log the error with label and active_sig, and then explicitly
terminate or mark those rows as failed/tombstoned so they are not retried (e.g.,
return a Vec of job outcomes indicating failure/tombstone for each id or
propagate an error up the chain); apply the same fix to the analogous block
around the later occurrence referenced in the comment.
---
Nitpick comments:
In `@src/openhuman/memory_tree/score/embed/mod.rs`:
- Around line 78-84: The batch helper embed_batch (and the related batch paths
around lines 112-142) lack debug telemetry—add debug-level tracing at entry
(including number of texts), at the decision point when you choose the per-text
fallback vs a real provider batch call, and on successful completion (including
total embeddings produced and duration); also log when you call the external
provider and any retry/timeouts. Use the existing tracing/logger used in this
module (the same logger used by embed/embed_batch) and include stable,
grep-friendly prefixes like "embed_batch:enter", "embed_batch:fallback",
"embed_batch:provider_call", and "embed_batch:success" so callers can
distinguish start/branch/external-call/success in logs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ae6029e-4b59-4191-9023-67b12b2bbdac
📒 Files selected for processing (4)
src/openhuman/memory_queue/handlers/mod.rssrc/openhuman/memory_tree/score/embed/cloud.rssrc/openhuman/memory_tree/score/embed/mod.rssrc/openhuman/memory_tree/score/embed/openai_compat.rs
…d batch telemetry Address CodeRabbit review on the re-embed batching change: - reembed_collect now returns Result and bails when embed_batch returns a result count that does not match the input. Previously it logged and returned an empty Vec, so handle_reembed_backfill wrote nothing yet still returned JobOutcome::Defer, re-selecting the same ids on every revisit (a non-converging chain). Surfacing the error terminates it. - Add debug telemetry to the batch embed path (embed_batch:enter / :success / :fallback) per the repo debug-logging rule, so the single-round-trip win vs the per-text fallback is traceable end-to-end.
Summary
handle_reembed_backfillpreviously embedded each chunk/summary body with a separate sequentialembedder.embed(&body).await— up toREEMBED_BACKFILL_BATCH = 16real Voyage round-trips per batch, chained across the whole re-embed.Embedder::embed_batch(&[&str]) -> Vec<Result<Vec<f32>>>(oneResultper input position) plusembed_batch_via_provider, which issues a single inner-providerembed(&[..])call and falls back to per-textembed_oneon a whole-batch error or a length mismatch.embed_batchinCloudEmbedderandOpenAiCompatEmbedder(both wrap a batch-capableEmbeddingProvider); Ollama keeps the sequential default (its memory-tree path posts a singleprompt, no batch endpoint).reembed_collect. No behavior change beyond the embed call shape — all Per-(row, model) embedding storage for memory_tree + event_log + segments #1574 §6 failure/tombstone semantics and log lines are preserved verbatim.Problem
The genuine network-bound embedding work in memory-sync lives in the re-embed backfill path (extract-job / signature-switch driven), not the per-item ingest loop (which defers embedding to async extract jobs and only does CPU-bound fast scoring synchronously).
handle_reembed_backfillreads each row's stored body and embeds it with a sequentialembedder.embed(&body).await. For the shipped default (logged-in user, cloudCloudEmbedder→ Voyage) every one of those is a real ~50–200ms network RTT, run strictly one after another up to 16 per batch, with the chainDefer-ing to revisit until the whole space is covered. After an embedder switch every prior row is missing at the new signature, so the backfill walks the entire memory tree this way.Solution
Split the embed call shape from the surrounding bookkeeping:
Embedder::embed_batch(new default trait method): takes&[&str], returnsVec<Result<Vec<f32>>>aligned by index (alwaystexts.len()elements). The default is the old sequential loop, so anyEmbedderkeeps working unchanged.embed_batch_via_provider: the shared override body. Issues oneinner.embed(&[..])provider call (the innerEmbeddingProvideris natively batched), dim-checks each vector into its slot, and — on a whole-batchError a returned-length mismatch — falls back to per-textembed_oneso a flaky batch endpoint degrades to the old behavior and per-position error attribution / tombstoning is never lost.CloudEmbedder/OpenAiCompatEmbedderoverrideembed_batchto delegate to that helper. Ollama does not (no batch endpoint), so it keeps the sequential default.reembed_collect(new generic helper): folds the two copy-pasted loops (chunks, summaries) into one. Phase A reads bodies and persistently tombstones read failures; Phase B issues a singleembed_batch; Phase C zips results back to ids and classifies each position exactly as the legacy loop did (pack_checkedok → keep, wrong dim → tombstone"embed wrong dim",Err(e)→ tombstone"embed failed: {e}"). A defensive length-mismatch guard skips the batch without tombstoning (so a later batch retries) rather than misattributing results.The SQLite sidecar write (Phase 3, one short tx) is unchanged. Ordering is irrelevant — results fold into order-independent state keyed by id.
Submission Checklist
embed_batch,embed_batch_via_provider,reembed_collect) is exercised directly by unit tests and the existinghandle_reembed_backfillintegration tests.N/A: internal perf change, no feature row affected.## Related—N/A.FakeProvider/SeqEmbedder, no network.N/A: no release-cut surface touched (background memory-sync job only).Closes #NNN—N/A: no tracking issue.Impact
openhumanlib); background re-embed backfill job only. No UI/CLI surface change.embeddings_provider = "none"the shape is unchanged — this is a background job, not interactive latency.reembed_collectpreserves the Per-(row, model) embedding storage for memory_tree + event_log + segments #1574 §6 contract verbatim — worklist, tombstone reasons ("body read failed: {e}","embed wrong dim","embed failed: {e}"), log prefix[memory::jobs] reembed_backfill:,Defer/Doneoutcomes, and the per-signature "attempt-at-most-once" tombstone guarantee.embed_batch_via_provider_happy_is_single_call(one batch call, no per-text fallback),_empty_makes_no_call,_falls_back_on_batch_error(1 failed batch + N per-text),_falls_back_on_length_mismatch,_maps_wrong_dim_per_position(length matched → single call, bad slot maps toErr), plusdefault_embed_batch_calls_embed_per_textand_preserves_per_position_errors. Focused suites: embed 49 passed, reembed 10 passed.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
perf/reembed-backfill-batchcff8188dd2aa6d0515d6486b9bb2e81c209267f6Validation Run
pnpm --filter openhuman-app format:check—N/A: no app/ (TS) changespnpm typecheck—N/A: no app/ (TS) changescargo test --lib memory_tree::score::embed(49 passed),cargo test --lib reembed(10 passed)cargo fmt+cargo check --libclean (exit 0)N/A: no app/src-tauri changesValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
Defer/Doneoutcomes,REEMBED_BACKFILL_BATCHbound, per-row read/embed log-and-skip, persistent tombstones (mark_chunk_reembed_skipped/mark_summary_reembed_skipped) with identical reason strings, sidecar write tx, progress/summary logs.embed_batchwhole-batch-error and length-mismatch both fall back to per-textembed_one;reembed_collectlength-mismatch guard skips without tombstoning so rows stay re-embeddable; Ollama keeps the sequential default (no batch endpoint).Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes
Tests