perf(memory): batch re-embed backfill into one provider request by mysma-9403 · Pull Request #3392 · tinyhumansai/openhuman

mysma-9403 · 2026-06-04T23:25:50Z

Summary

Collapse the re-embed backfill's per-item embedding into one batched provider request per bounded batch. handle_reembed_backfill previously embedded each chunk/summary body with a separate sequential embedder.embed(&body).await — up to REEMBED_BACKFILL_BATCH = 16 real Voyage round-trips per batch, chained across the whole re-embed.
Add a default Embedder::embed_batch(&[&str]) -> Vec<Result<Vec<f32>>> (one Result per input position) plus embed_batch_via_provider, which issues a single inner-provider embed(&[..]) call and falls back to per-text embed_one on a whole-batch error or a length mismatch.
Override embed_batch in CloudEmbedder and OpenAiCompatEmbedder (both wrap a batch-capable EmbeddingProvider); Ollama keeps the sequential default (its memory-tree path posts a single prompt, no batch endpoint).
Refactor the two near-identical reembed loops into one generic reembed_collect. No behavior change beyond the embed call shape — all Per-(row, model) embedding storage for memory_tree + event_log + segments #1574 §6 failure/tombstone semantics and log lines are preserved verbatim.

Problem

The genuine network-bound embedding work in memory-sync lives in the re-embed backfill path (extract-job / signature-switch driven), not the per-item ingest loop (which defers embedding to async extract jobs and only does CPU-bound fast scoring synchronously). handle_reembed_backfill reads each row's stored body and embeds it with a sequential embedder.embed(&body).await. For the shipped default (logged-in user, cloud CloudEmbedder → Voyage) every one of those is a real ~50–200ms network RTT, run strictly one after another up to 16 per batch, with the chain Defer-ing to revisit until the whole space is covered. After an embedder switch every prior row is missing at the new signature, so the backfill walks the entire memory tree this way.

Solution

Split the embed call shape from the surrounding bookkeeping:

Embedder::embed_batch (new default trait method): takes &[&str], returns Vec<Result<Vec<f32>>> aligned by index (always texts.len() elements). The default is the old sequential loop, so any Embedder keeps working unchanged.
embed_batch_via_provider: the shared override body. Issues one inner.embed(&[..]) provider call (the inner EmbeddingProvider is natively batched), dim-checks each vector into its slot, and — on a whole-batch Err or a returned-length mismatch — falls back to per-text embed_one so a flaky batch endpoint degrades to the old behavior and per-position error attribution / tombstoning is never lost.
CloudEmbedder / OpenAiCompatEmbedder override embed_batch to delegate to that helper. Ollama does not (no batch endpoint), so it keeps the sequential default.
reembed_collect (new generic helper): folds the two copy-pasted loops (chunks, summaries) into one. Phase A reads bodies and persistently tombstones read failures; Phase B issues a single embed_batch; Phase C zips results back to ids and classifies each position exactly as the legacy loop did (pack_checked ok → keep, wrong dim → tombstone "embed wrong dim", Err(e) → tombstone "embed failed: {e}"). A defensive length-mismatch guard skips the batch without tombstoning (so a later batch retries) rather than misattributing results.

The SQLite sidecar write (Phase 3, one short tx) is unchanged. Ordering is irrelevant — results fold into order-independent state keyed by id.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) — 7 new unit tests for the batch helper + default trait method; see Impact.
Diff coverage ≥ 80% — the new executable logic (embed_batch, embed_batch_via_provider, reembed_collect) is exercised directly by unit tests and the existing handle_reembed_backfill integration tests.
Coverage matrix updated — N/A: internal perf change, no feature row affected.
All affected feature IDs from the matrix are listed under ## Related — N/A.
No new external network dependencies introduced — tests use an in-process FakeProvider/SeqEmbedder, no network.
Manual smoke checklist updated — N/A: no release-cut surface touched (background memory-sync job only).
Linked issue closed via Closes #NNN — N/A: no tracking issue.

Impact

Platform: core (openhuman lib); background re-embed backfill job only. No UI/CLI surface change.
Performance: the embed phase of a re-embed backfill batch goes from N serial Voyage round-trips to one batched request (N ≤ 16 per batch). Largest win on a post-switch full re-embed; steady state (small/no backfill) is unaffected. Under Ollama (no batch endpoint) or embeddings_provider = "none" the shape is unchanged — this is a background job, not interactive latency.
Parity: reembed_collect preserves the Per-(row, model) embedding storage for memory_tree + event_log + segments #1574 §6 contract verbatim — worklist, tombstone reasons ("body read failed: {e}", "embed wrong dim", "embed failed: {e}"), log prefix [memory::jobs] reembed_backfill:, Defer/Done outcomes, and the per-signature "attempt-at-most-once" tombstone guarantee.
Tests: embed_batch_via_provider_happy_is_single_call (one batch call, no per-text fallback), _empty_makes_no_call, _falls_back_on_batch_error (1 failed batch + N per-text), _falls_back_on_length_mismatch, _maps_wrong_dim_per_position (length matched → single call, bad slot maps to Err), plus default_embed_batch_calls_embed_per_text and _preserves_per_position_errors. Focused suites: embed 49 passed, reembed 10 passed.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: perf/reembed-backfill-batch
Commit SHA: cff8188dd2aa6d0515d6486b9bb2e81c209267f6

Validation Run

pnpm --filter openhuman-app format:check — N/A: no app/ (TS) changes
pnpm typecheck — N/A: no app/ (TS) changes
Focused tests: cargo test --lib memory_tree::score::embed (49 passed), cargo test --lib reembed (10 passed)
Rust fmt/check (if changed): cargo fmt + cargo check --lib clean (exit 0)
Tauri fmt/check (if changed): N/A: no app/src-tauri changes

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: embed-phase batching only (one provider request instead of N serial round-trips per backfill batch).
User-visible effect: faster re-embed backfill (notably after an embedder switch) when using a batch-capable cloud embedder; no change to what gets embedded or stored.

Parity Contract

Legacy behavior preserved: worklist queries, Defer/Done outcomes, REEMBED_BACKFILL_BATCH bound, per-row read/embed log-and-skip, persistent tombstones (mark_chunk_reembed_skipped / mark_summary_reembed_skipped) with identical reason strings, sidecar write tx, progress/summary logs.
Guard/fallback/dispatch parity checks: embed_batch whole-batch-error and length-mismatch both fall back to per-text embed_one; reembed_collect length-mismatch guard skips without tombstoning so rows stay re-embeddable; Ollama keeps the sequential default (no batch endpoint).

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: N/A
Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

New Features
- Batched embedding for bulk text processing, used during re-embedding/backfill to improve throughput and efficiency.
- Per-item handling that records skipped items when embedding fails or dimensions mismatch.
Bug Fixes
- Enforced strict embedding-dimension validation to avoid silent misalignments and surface per-item errors.
Tests
- Added deterministic async tests covering batch happy-paths, fallbacks, and per-item error cases.

`handle_reembed_backfill` embedded each chunk/summary body with a separate sequential `embedder.embed(&body).await` — N real Voyage round-trips per bounded batch (REEMBED_BACKFILL_BATCH = 16). Collapse those into a single batched provider request. - Add a default `Embedder::embed_batch(&[&str]) -> Vec<Result<Vec<f32>>>` (sequential fallback) plus `embed_batch_via_provider`, which issues one inner-provider `embed(&[..])` call and falls back to per-text `embed_one` on a whole-batch error or a length mismatch so per-position error attribution / tombstoning is preserved. - Override `embed_batch` in `CloudEmbedder` and `OpenAiCompatEmbedder` (both wrap a batch-capable `EmbeddingProvider`); Ollama keeps the sequential default (its memory-tree path has no batch endpoint). - Refactor the two reembed loops into one generic `reembed_collect` (Phase A: read bodies + tombstone read failures; Phase B: one `embed_batch`; Phase C: classify per position exactly as before). All tinyhumansai#1574 §6 failure/tombstone semantics and log lines are preserved verbatim — only the embed call shape changes. Tests: 7 new unit tests for the batch helper (happy/empty/fallback-on- error/fallback-on-mismatch/wrong-dim-per-position) + default-trait ordering and per-position error propagation. Existing `handle_reembed_backfill` integration tests (InertEmbedder) cover the refactored path end-to-end. embed 49/49, reembed 10/10 green.

coderabbitai · 2026-06-04T23:26:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69af5859-9e51-4fe5-945a-ff055ca71bba

📥 Commits

Reviewing files that changed from the base of the PR and between cff8188 and e8f25b5.

📒 Files selected for processing (2)

src/openhuman/memory_queue/handlers/mod.rs
src/openhuman/memory_tree/score/embed/mod.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/openhuman/memory_queue/handlers/mod.rs

📝 Walkthrough

Walkthrough

Adds Embedder::embed_batch, provider-side batch orchestration with per-text fallback and dimension checks, Cloud/OpenAI-compatible embedder batch wiring, tests for batch/fallback behavior, and a reembed_collect helper used by the backfill handler to batch embeds.

Changes

Batched Embedding Architecture and Application

Layer / File(s)	Summary
Embedder trait and validation `src/openhuman/memory_tree/score/embed/mod.rs`	Adds a default `embed_batch` and `check_embed_dim` to validate embedding length and preserve per-input result slots.
Provider batch orchestration and per-text fallback `src/openhuman/memory_tree/score/embed/mod.rs`	`embed_batch_via_provider` attempts a provider batch call and falls back to `embed_each_via_provider` on batch errors or response-length mismatches; per-text fallback attaches context and re-validates dimensions.
Batch tests `src/openhuman/memory_tree/score/embed/mod.rs`	Async tests (FakeProvider, SeqEmbedder) verify batch collapse, empty-input, fallback on errors or mismatches, and per-position error mapping.
Embedder implementations with batch support `src/openhuman/memory_tree/score/embed/cloud.rs`, `src/openhuman/memory_tree/score/embed/openai_compat.rs`	CloudEmbedder and OpenAiCompatEmbedder implement `embed_batch` by delegating to `embed_batch_via_provider`.
Backfill handler batch embedding integration `src/openhuman/memory_queue/handlers/mod.rs`	Adds `reembed_collect` to read bodies, tombstone unreadable rows, call `embed_batch` once, enforce result-count alignment, persist per-row skip tombstones for wrong-dimension or embed errors, and updates `handle_reembed_backfill` to use it.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hop through batches, vectors in a line,
One call for many, tidy and fine.
If one goes sideways, I mark where it slipped,
I log and I tombstone, the rest stay equipped.
A rabbit’s small cheer for embedding refined.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the primary change: batching re-embed backfill requests into a single provider request for performance improvement, which directly reflects the core optimization in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/openhuman/memory_tree/score/embed/mod.rs (1)
78-84: ⚡ Quick win

Add debug-level telemetry around the new batch paths.

These helpers only emit logs once something is already wrong. There is still no debug trace for batch start/success or for the fallback decision, which makes it hard to confirm from logs whether a backfill actually collapsed into one provider call or silently ran per-text. As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

Also applies to: 112-142
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/score/embed/mod.rs` around lines 78 - 84, The batch
helper embed_batch (and the related batch paths around lines 112-142) lack debug
telemetry—add debug-level tracing at entry (including number of texts), at the
decision point when you choose the per-text fallback vs a real provider batch
call, and on successful completion (including total embeddings produced and
duration); also log when you call the external provider and any retry/timeouts.
Use the existing tracing/logger used in this module (the same logger used by
embed/embed_batch) and include stable, grep-friendly prefixes like
"embed_batch:enter", "embed_batch:fallback", "embed_batch:provider_call", and
"embed_batch:success" so callers can distinguish
start/branch/external-call/success in logs.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_queue/handlers/mod.rs`:
- Around line 754-765: The current reembed_backfill path detects an embed_batch
contract violation (results.len() != readable.len()) and returns an empty Vec,
which causes the caller to treat it as a no-op and repeatedly Defer the same
ids; instead, change the branch in reembed_backfill so that it does not return
an empty Vec: log the error with label and active_sig, and then explicitly
terminate or mark those rows as failed/tombstoned so they are not retried (e.g.,
return a Vec of job outcomes indicating failure/tombstone for each id or
propagate an error up the chain); apply the same fix to the analogous block
around the later occurrence referenced in the comment.

---

Nitpick comments:
In `@src/openhuman/memory_tree/score/embed/mod.rs`:
- Around line 78-84: The batch helper embed_batch (and the related batch paths
around lines 112-142) lack debug telemetry—add debug-level tracing at entry
(including number of texts), at the decision point when you choose the per-text
fallback vs a real provider batch call, and on successful completion (including
total embeddings produced and duration); also log when you call the external
provider and any retry/timeouts. Use the existing tracing/logger used in this
module (the same logger used by embed/embed_batch) and include stable,
grep-friendly prefixes like "embed_batch:enter", "embed_batch:fallback",
"embed_batch:provider_call", and "embed_batch:success" so callers can
distinguish start/branch/external-call/success in logs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ae6029e-4b59-4191-9023-67b12b2bbdac

📥 Commits

Reviewing files that changed from the base of the PR and between e1d25d8 and cff8188.

📒 Files selected for processing (4)

src/openhuman/memory_queue/handlers/mod.rs
src/openhuman/memory_tree/score/embed/cloud.rs
src/openhuman/memory_tree/score/embed/mod.rs
src/openhuman/memory_tree/score/embed/openai_compat.rs

…d batch telemetry Address CodeRabbit review on the re-embed batching change: - reembed_collect now returns Result and bails when embed_batch returns a result count that does not match the input. Previously it logged and returned an empty Vec, so handle_reembed_backfill wrote nothing yet still returned JobOutcome::Defer, re-selecting the same ids on every revisit (a non-converging chain). Surfacing the error terminates it. - Add debug telemetry to the batch embed path (embed_batch:enter / :success / :fallback) per the repo debug-logging rule, so the single-round-trip win vs the per-text fallback is traceable end-to-end.

mysma-9403 requested a review from a team June 4, 2026 23:25

coderabbitai Bot requested changes Jun 4, 2026

View reviewed changes

Comment thread src/openhuman/memory_queue/handlers/mod.rs

coderabbitai Bot approved these changes Jun 5, 2026

View reviewed changes

senamakel merged commit a1cd789 into tinyhumansai:main Jun 5, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(memory): batch re-embed backfill into one provider request#3392

perf(memory): batch re-embed backfill into one provider request#3392
senamakel merged 2 commits into
tinyhumansai:mainfrom
mysma-9403:perf/reembed-backfill-batch

mysma-9403 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mysma-9403 commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mysma-9403 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading