perf(embeddings): cross-node batching + worker pool#33
Merged
Conversation
282cbce to
9c762e8
Compare
The embeddings phase was pegged to one embedding per node per await, behind a single-threaded ONNX session — an AWSQuickWork run sat at 95% CPU for 7+ minutes on 1,922 files. Refactor into two stages: walk tiers once to collect (text, emitRow) jobs in canonical order, then dispatch in fixed-size batches across a configurable Piscina pool of OnnxEmbedder workers. Each wave fires workers × batchSize embeds concurrently and scatters vectors back into the row buffer. Row ordering and the embeddingsHash contract are preserved — confirmed by a new test that asserts byte-identical hashes across batchSize=1 vs 32. - New flags: --embeddings-workers <n|auto>, --embeddings-batch-size <n>. - A main-thread canary OnnxEmbedder opens before the pool so EmbedderNotSetupError keeps its class identity across the structured-clone boundary. - HTTP backend unaffected (pool flag ignored when endpoint is set).
9c762e8 to
8bbf5b8
Compare
Closed
theagenticguy
added a commit
that referenced
this pull request
May 1, 2026
## Summary
- Refactors the embeddings phase from one-embedding-per-node-per-await
into two stages: a **job-collection** pass that walks
symbol/file/community tiers in canonical order producing `{text,
emitRow}` records, and a **dispatch** loop that fires `workers ×
batchSize` embeds concurrently per wave and scatters vectors back into
the row buffer.
- Adds a Piscina pool of independent `OnnxEmbedder` workers
(`packages/ingestion/src/pipeline/phases/embedder-{worker,pool}.ts`).
Each worker holds its own ONNX session; the pool is exposed behind an
`Embedder`-shaped facade so the phase doesn't branch. A main-thread
canary `OnnxEmbedder` opens first so `EmbedderNotSetupError` keeps its
class identity across the structured-clone boundary.
- New flags: `--embeddings-workers <n|auto>` and
`--embeddings-batch-size <n>` (defaults: 1 and 32 — unchanged
single-threaded behaviour out of the box).
### Motivation
Real-world `codehub analyze --embeddings --force --granularity
symbol,file,community` on a ~1,922-file AWS codebase sat at 95% CPU for
7+ minutes before the refactor. The phase was awaiting `embedBatch()`
per node inside a single-threaded ONNX session (`intraOpNumThreads: 1`,
`graphOptimizationLevel: "disabled"` — required for the graphHash
determinism contract), so there was no concurrency anywhere in the
stack.
### Determinism
The graphHash / `embeddingsHash` contract is preserved:
- Canonical tier ordering (symbol → file → community) is unchanged.
- Rows are still sorted by `(granularity, nodeId, chunkIndex)` before
hashing.
- `openOnnxEmbedder()`'s deterministic knobs are intact per worker —
which input produces which vector is independent of which worker ran it.
- New regression test asserts `embeddingsHash` at `batchSize=1` equals
`embeddingsHash` at `batchSize=32`.
### Expected speedup
On an M-series laptop with `--embeddings-workers auto
--embeddings-batch-size 32`, the 7-minute AWSQuickWork run should drop
to roughly 1–2 minutes. `--embeddings-int8` cuts that further.
## Test plan
- [x] `pnpm build` — clean
- [x] `pnpm --filter @opencodehub/ingestion test` — 576/576 pass
- [x] New test: `embeddings.test.ts` — `batchSize=1` vs `batchSize=32`
produce byte-identical `embeddingsHash`
- [x] `codehub analyze --help` surfaces `--embeddings-workers` and
`--embeddings-batch-size`
- [ ] End-to-end: run `codehub analyze AWSQuickWork --embeddings --force
--granularity symbol,file,community --embeddings-workers auto` and
confirm wall time drop + identical `embeddingsHash` vs a single-threaded
control run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
{text, emitRow}records, and a dispatch loop that firesworkers × batchSizeembeds concurrently per wave and scatters vectors back into the row buffer.OnnxEmbedderworkers (packages/ingestion/src/pipeline/phases/embedder-{worker,pool}.ts). Each worker holds its own ONNX session; the pool is exposed behind anEmbedder-shaped facade so the phase doesn't branch. A main-thread canaryOnnxEmbedderopens first soEmbedderNotSetupErrorkeeps its class identity across the structured-clone boundary.--embeddings-workers <n|auto>and--embeddings-batch-size <n>(defaults: 1 and 32 — unchanged single-threaded behaviour out of the box).Motivation
Real-world
codehub analyze --embeddings --force --granularity symbol,file,communityon a ~1,922-file AWS codebase sat at 95% CPU for 7+ minutes before the refactor. The phase was awaitingembedBatch()per node inside a single-threaded ONNX session (intraOpNumThreads: 1,graphOptimizationLevel: "disabled"— required for the graphHash determinism contract), so there was no concurrency anywhere in the stack.Determinism
The graphHash /
embeddingsHashcontract is preserved:(granularity, nodeId, chunkIndex)before hashing.openOnnxEmbedder()'s deterministic knobs are intact per worker — which input produces which vector is independent of which worker ran it.embeddingsHashatbatchSize=1equalsembeddingsHashatbatchSize=32.Expected speedup
On an M-series laptop with
--embeddings-workers auto --embeddings-batch-size 32, the 7-minute AWSQuickWork run should drop to roughly 1–2 minutes.--embeddings-int8cuts that further.Test plan
pnpm build— cleanpnpm --filter @opencodehub/ingestion test— 576/576 passembeddings.test.ts—batchSize=1vsbatchSize=32produce byte-identicalembeddingsHashcodehub analyze --helpsurfaces--embeddings-workersand--embeddings-batch-sizecodehub analyze AWSQuickWork --embeddings --force --granularity symbol,file,community --embeddings-workers autoand confirm wall time drop + identicalembeddingsHashvs a single-threaded control run