Skip to content

perf(memory): batch re-embed backfill into one provider request#3392

Merged
senamakel merged 2 commits into
tinyhumansai:mainfrom
mysma-9403:perf/reembed-backfill-batch
Jun 5, 2026
Merged

perf(memory): batch re-embed backfill into one provider request#3392
senamakel merged 2 commits into
tinyhumansai:mainfrom
mysma-9403:perf/reembed-backfill-batch

Conversation

@mysma-9403
Copy link
Copy Markdown
Contributor

@mysma-9403 mysma-9403 commented Jun 4, 2026

Summary

  • Collapse the re-embed backfill's per-item embedding into one batched provider request per bounded batch. handle_reembed_backfill previously embedded each chunk/summary body with a separate sequential embedder.embed(&body).await — up to REEMBED_BACKFILL_BATCH = 16 real Voyage round-trips per batch, chained across the whole re-embed.
  • Add a default Embedder::embed_batch(&[&str]) -> Vec<Result<Vec<f32>>> (one Result per input position) plus embed_batch_via_provider, which issues a single inner-provider embed(&[..]) call and falls back to per-text embed_one on a whole-batch error or a length mismatch.
  • Override embed_batch in CloudEmbedder and OpenAiCompatEmbedder (both wrap a batch-capable EmbeddingProvider); Ollama keeps the sequential default (its memory-tree path posts a single prompt, no batch endpoint).
  • Refactor the two near-identical reembed loops into one generic reembed_collect. No behavior change beyond the embed call shape — all Per-(row, model) embedding storage for memory_tree + event_log + segments #1574 §6 failure/tombstone semantics and log lines are preserved verbatim.

Problem

The genuine network-bound embedding work in memory-sync lives in the re-embed backfill path (extract-job / signature-switch driven), not the per-item ingest loop (which defers embedding to async extract jobs and only does CPU-bound fast scoring synchronously). handle_reembed_backfill reads each row's stored body and embeds it with a sequential embedder.embed(&body).await. For the shipped default (logged-in user, cloud CloudEmbedder → Voyage) every one of those is a real ~50–200ms network RTT, run strictly one after another up to 16 per batch, with the chain Defer-ing to revisit until the whole space is covered. After an embedder switch every prior row is missing at the new signature, so the backfill walks the entire memory tree this way.

Solution

Split the embed call shape from the surrounding bookkeeping:

  • Embedder::embed_batch (new default trait method): takes &[&str], returns Vec<Result<Vec<f32>>> aligned by index (always texts.len() elements). The default is the old sequential loop, so any Embedder keeps working unchanged.
  • embed_batch_via_provider: the shared override body. Issues one inner.embed(&[..]) provider call (the inner EmbeddingProvider is natively batched), dim-checks each vector into its slot, and — on a whole-batch Err or a returned-length mismatch — falls back to per-text embed_one so a flaky batch endpoint degrades to the old behavior and per-position error attribution / tombstoning is never lost.
  • CloudEmbedder / OpenAiCompatEmbedder override embed_batch to delegate to that helper. Ollama does not (no batch endpoint), so it keeps the sequential default.
  • reembed_collect (new generic helper): folds the two copy-pasted loops (chunks, summaries) into one. Phase A reads bodies and persistently tombstones read failures; Phase B issues a single embed_batch; Phase C zips results back to ids and classifies each position exactly as the legacy loop did (pack_checked ok → keep, wrong dim → tombstone "embed wrong dim", Err(e) → tombstone "embed failed: {e}"). A defensive length-mismatch guard skips the batch without tombstoning (so a later batch retries) rather than misattributing results.

The SQLite sidecar write (Phase 3, one short tx) is unchanged. Ordering is irrelevant — results fold into order-independent state keyed by id.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) — 7 new unit tests for the batch helper + default trait method; see Impact.
  • Diff coverage ≥ 80% — the new executable logic (embed_batch, embed_batch_via_provider, reembed_collect) is exercised directly by unit tests and the existing handle_reembed_backfill integration tests.
  • Coverage matrix updated — N/A: internal perf change, no feature row affected.
  • All affected feature IDs from the matrix are listed under ## RelatedN/A.
  • No new external network dependencies introduced — tests use an in-process FakeProvider/SeqEmbedder, no network.
  • Manual smoke checklist updated — N/A: no release-cut surface touched (background memory-sync job only).
  • Linked issue closed via Closes #NNNN/A: no tracking issue.

Impact

  • Platform: core (openhuman lib); background re-embed backfill job only. No UI/CLI surface change.
  • Performance: the embed phase of a re-embed backfill batch goes from N serial Voyage round-trips to one batched request (N ≤ 16 per batch). Largest win on a post-switch full re-embed; steady state (small/no backfill) is unaffected. Under Ollama (no batch endpoint) or embeddings_provider = "none" the shape is unchanged — this is a background job, not interactive latency.
  • Parity: reembed_collect preserves the Per-(row, model) embedding storage for memory_tree + event_log + segments #1574 §6 contract verbatim — worklist, tombstone reasons ("body read failed: {e}", "embed wrong dim", "embed failed: {e}"), log prefix [memory::jobs] reembed_backfill:, Defer/Done outcomes, and the per-signature "attempt-at-most-once" tombstone guarantee.
  • Tests: embed_batch_via_provider_happy_is_single_call (one batch call, no per-text fallback), _empty_makes_no_call, _falls_back_on_batch_error (1 failed batch + N per-text), _falls_back_on_length_mismatch, _maps_wrong_dim_per_position (length matched → single call, bad slot maps to Err), plus default_embed_batch_calls_embed_per_text and _preserves_per_position_errors. Focused suites: embed 49 passed, reembed 10 passed.

Related

  • Closes: N/A
  • Follow-up PR(s)/TODOs: N/A

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: perf/reembed-backfill-batch
  • Commit SHA: cff8188dd2aa6d0515d6486b9bb2e81c209267f6

Validation Run

  • pnpm --filter openhuman-app format:checkN/A: no app/ (TS) changes
  • pnpm typecheckN/A: no app/ (TS) changes
  • Focused tests: cargo test --lib memory_tree::score::embed (49 passed), cargo test --lib reembed (10 passed)
  • Rust fmt/check (if changed): cargo fmt + cargo check --lib clean (exit 0)
  • Tauri fmt/check (if changed): N/A: no app/src-tauri changes

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: embed-phase batching only (one provider request instead of N serial round-trips per backfill batch).
  • User-visible effect: faster re-embed backfill (notably after an embedder switch) when using a batch-capable cloud embedder; no change to what gets embedded or stored.

Parity Contract

  • Legacy behavior preserved: worklist queries, Defer/Done outcomes, REEMBED_BACKFILL_BATCH bound, per-row read/embed log-and-skip, persistent tombstones (mark_chunk_reembed_skipped / mark_summary_reembed_skipped) with identical reason strings, sidecar write tx, progress/summary logs.
  • Guard/fallback/dispatch parity checks: embed_batch whole-batch-error and length-mismatch both fall back to per-text embed_one; reembed_collect length-mismatch guard skips without tombstoning so rows stay re-embeddable; Ollama keeps the sequential default (no batch endpoint).

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • New Features

    • Batched embedding for bulk text processing, used during re-embedding/backfill to improve throughput and efficiency.
    • Per-item handling that records skipped items when embedding fails or dimensions mismatch.
  • Bug Fixes

    • Enforced strict embedding-dimension validation to avoid silent misalignments and surface per-item errors.
  • Tests

    • Added deterministic async tests covering batch happy-paths, fallbacks, and per-item error cases.

`handle_reembed_backfill` embedded each chunk/summary body with a
separate sequential `embedder.embed(&body).await` — N real Voyage
round-trips per bounded batch (REEMBED_BACKFILL_BATCH = 16). Collapse
those into a single batched provider request.

- Add a default `Embedder::embed_batch(&[&str]) -> Vec<Result<Vec<f32>>>`
  (sequential fallback) plus `embed_batch_via_provider`, which issues one
  inner-provider `embed(&[..])` call and falls back to per-text
  `embed_one` on a whole-batch error or a length mismatch so per-position
  error attribution / tombstoning is preserved.
- Override `embed_batch` in `CloudEmbedder` and `OpenAiCompatEmbedder`
  (both wrap a batch-capable `EmbeddingProvider`); Ollama keeps the
  sequential default (its memory-tree path has no batch endpoint).
- Refactor the two reembed loops into one generic `reembed_collect`
  (Phase A: read bodies + tombstone read failures; Phase B: one
  `embed_batch`; Phase C: classify per position exactly as before). All
  tinyhumansai#1574 §6 failure/tombstone semantics and log lines are preserved
  verbatim — only the embed call shape changes.

Tests: 7 new unit tests for the batch helper (happy/empty/fallback-on-
error/fallback-on-mismatch/wrong-dim-per-position) + default-trait
ordering and per-position error propagation. Existing
`handle_reembed_backfill` integration tests (InertEmbedder) cover the
refactored path end-to-end. embed 49/49, reembed 10/10 green.
@mysma-9403 mysma-9403 requested a review from a team June 4, 2026 23:25
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69af5859-9e51-4fe5-945a-ff055ca71bba

📥 Commits

Reviewing files that changed from the base of the PR and between cff8188 and e8f25b5.

📒 Files selected for processing (2)
  • src/openhuman/memory_queue/handlers/mod.rs
  • src/openhuman/memory_tree/score/embed/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/memory_queue/handlers/mod.rs

📝 Walkthrough

Walkthrough

Adds Embedder::embed_batch, provider-side batch orchestration with per-text fallback and dimension checks, Cloud/OpenAI-compatible embedder batch wiring, tests for batch/fallback behavior, and a reembed_collect helper used by the backfill handler to batch embeds.

Changes

Batched Embedding Architecture and Application

Layer / File(s) Summary
Embedder trait and validation
src/openhuman/memory_tree/score/embed/mod.rs
Adds a default embed_batch and check_embed_dim to validate embedding length and preserve per-input result slots.
Provider batch orchestration and per-text fallback
src/openhuman/memory_tree/score/embed/mod.rs
embed_batch_via_provider attempts a provider batch call and falls back to embed_each_via_provider on batch errors or response-length mismatches; per-text fallback attaches context and re-validates dimensions.
Batch tests
src/openhuman/memory_tree/score/embed/mod.rs
Async tests (FakeProvider, SeqEmbedder) verify batch collapse, empty-input, fallback on errors or mismatches, and per-position error mapping.
Embedder implementations with batch support
src/openhuman/memory_tree/score/embed/cloud.rs, src/openhuman/memory_tree/score/embed/openai_compat.rs
CloudEmbedder and OpenAiCompatEmbedder implement embed_batch by delegating to embed_batch_via_provider.
Backfill handler batch embedding integration
src/openhuman/memory_queue/handlers/mod.rs
Adds reembed_collect to read bodies, tombstone unreadable rows, call embed_batch once, enforce result-count alignment, persist per-row skip tombstones for wrong-dimension or embed errors, and updates handle_reembed_backfill to use it.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hop through batches, vectors in a line,
One call for many, tidy and fine.
If one goes sideways, I mark where it slipped,
I log and I tombstone, the rest stay equipped.
A rabbit’s small cheer for embedding refined.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the primary change: batching re-embed backfill requests into a single provider request for performance improvement, which directly reflects the core optimization in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/memory_tree/score/embed/mod.rs (1)

78-84: ⚡ Quick win

Add debug-level telemetry around the new batch paths.

These helpers only emit logs once something is already wrong. There is still no debug trace for batch start/success or for the fallback decision, which makes it hard to confirm from logs whether a backfill actually collapsed into one provider call or silently ran per-text. As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

Also applies to: 112-142

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/score/embed/mod.rs` around lines 78 - 84, The batch
helper embed_batch (and the related batch paths around lines 112-142) lack debug
telemetry—add debug-level tracing at entry (including number of texts), at the
decision point when you choose the per-text fallback vs a real provider batch
call, and on successful completion (including total embeddings produced and
duration); also log when you call the external provider and any retry/timeouts.
Use the existing tracing/logger used in this module (the same logger used by
embed/embed_batch) and include stable, grep-friendly prefixes like
"embed_batch:enter", "embed_batch:fallback", "embed_batch:provider_call", and
"embed_batch:success" so callers can distinguish
start/branch/external-call/success in logs.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_queue/handlers/mod.rs`:
- Around line 754-765: The current reembed_backfill path detects an embed_batch
contract violation (results.len() != readable.len()) and returns an empty Vec,
which causes the caller to treat it as a no-op and repeatedly Defer the same
ids; instead, change the branch in reembed_backfill so that it does not return
an empty Vec: log the error with label and active_sig, and then explicitly
terminate or mark those rows as failed/tombstoned so they are not retried (e.g.,
return a Vec of job outcomes indicating failure/tombstone for each id or
propagate an error up the chain); apply the same fix to the analogous block
around the later occurrence referenced in the comment.

---

Nitpick comments:
In `@src/openhuman/memory_tree/score/embed/mod.rs`:
- Around line 78-84: The batch helper embed_batch (and the related batch paths
around lines 112-142) lack debug telemetry—add debug-level tracing at entry
(including number of texts), at the decision point when you choose the per-text
fallback vs a real provider batch call, and on successful completion (including
total embeddings produced and duration); also log when you call the external
provider and any retry/timeouts. Use the existing tracing/logger used in this
module (the same logger used by embed/embed_batch) and include stable,
grep-friendly prefixes like "embed_batch:enter", "embed_batch:fallback",
"embed_batch:provider_call", and "embed_batch:success" so callers can
distinguish start/branch/external-call/success in logs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ae6029e-4b59-4191-9023-67b12b2bbdac

📥 Commits

Reviewing files that changed from the base of the PR and between e1d25d8 and cff8188.

📒 Files selected for processing (4)
  • src/openhuman/memory_queue/handlers/mod.rs
  • src/openhuman/memory_tree/score/embed/cloud.rs
  • src/openhuman/memory_tree/score/embed/mod.rs
  • src/openhuman/memory_tree/score/embed/openai_compat.rs

Comment thread src/openhuman/memory_queue/handlers/mod.rs
…d batch telemetry

Address CodeRabbit review on the re-embed batching change:

- reembed_collect now returns Result and bails when embed_batch returns
  a result count that does not match the input. Previously it logged and
  returned an empty Vec, so handle_reembed_backfill wrote nothing yet
  still returned JobOutcome::Defer, re-selecting the same ids on every
  revisit (a non-converging chain). Surfacing the error terminates it.
- Add debug telemetry to the batch embed path (embed_batch:enter /
  :success / :fallback) per the repo debug-logging rule, so the
  single-round-trip win vs the per-text fallback is traceable end-to-end.
@senamakel senamakel merged commit a1cd789 into tinyhumansai:main Jun 5, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants