perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes#541
Merged
nicoloboschi merged 5 commits intomainfrom Mar 11, 2026
Merged
perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes#541nicoloboschi merged 5 commits intomainfrom
nicoloboschi merged 5 commits intomainfrom
Conversation
…W indexes The previous retrieve_semantic_bm25_combined() used ROW_NUMBER() OVER (PARTITION BY fact_type ...) which forced a full sequential scan — pgvector cannot use HNSW indexes when a window function partitions on the same column as the ORDER BY. Changes: - retrieval.py: rewrite to UNION ALL of per-fact_type subqueries; each arm has its own ORDER BY embedding <=> $1 LIMIT n, enabling partial HNSW index scans. Semantic arms over-fetch 5x (min 100) for HNSW approximation; trimmed in Python. - memory_engine.py: set hnsw.ef_search=200 at pool init (persistent per-connection, no per-query SET/RESET overhead). - bank_utils.py: add create_bank_hnsw_indexes / drop_bank_hnsw_indexes for per-(bank_id, fact_type) partial HNSW index lifecycle management. - fact_storage.py / bank_utils.py: create per-bank indexes on fresh bank insert. - memory_engine.py delete_bank: drop per-bank indexes via DELETE...RETURNING to avoid a separate round-trip. - Migration a3b4c5d6e7f8: add interim fact_type-only partial indexes. - Migration d5e6f7a8b9c0: add internal_id UUID UNIQUE to banks, replace fact_type-only indexes with per-(bank, fact_type) partial HNSW indexes, drop the global idx_memory_units_embedding that competed with them. Why per-(bank, fact_type) not just per-fact_type: The idx_memory_units_bank_id B-tree index always wins over fact_type-only partial indexes when bank_id appears in the WHERE clause. Including bank_id in the partial index predicate removes the B-tree from consideration and lets the planner choose HNSW. The global HNSW index must also be dropped to avoid competing for the larger fact_type partitions (world, observation).
Instead of relying on DEFAULT gen_random_uuid() and RETURNING internal_id, generate the UUID in application code before the INSERT. This means we always know the value upfront and can call create_bank_hnsw_indexes immediately without needing a DB round-trip to retrieve the assigned ID. Also adds tests for HNSW index lifecycle and retrieve_semantic_bm25_combined.
Migration fixes: - Add text() wrappers for raw SQL in d5e6f7a8b9c0 (SQLAlchemy 2.0 compat) - Drop stale fact_type-only partial indexes (idx_mu_emb_world/observation/experience) that may exist from prior migrations on the same DB migrations.py fix: - Skip global HNSW index creation when per-bank partial HNSW indexes already exist on memory_units (idx_mu_emb_* pattern). Without this, the post-migration vector index check detects no %embedding% named index and recreates the global idx_memory_units_embedding, which defeats the per-bank index strategy. Verified with EXPLAIN ANALYZE on 66K-row bank: all three fact_type arms use their per-bank HNSW index scan (idx_mu_emb_worl/expr/obsv_<uid16>).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
retrieve_semantic_bm25_combined()usedROW_NUMBER() OVER (PARTITION BY fact_type ORDER BY embedding <=> $1)to get top-N results per fact type. This forced a full sequential scan — pgvector cannot use HNSW indexes when a window function partitions on the ordering column.Closes #539.
Solution
1. UNION ALL retrieval (
retrieval.py)Replace the window-function CTE with a UNION ALL of per-fact_type subqueries. Each arm has its own
ORDER BY embedding <=> $1 LIMIT n, which gives the query planner a shape it can satisfy with an HNSW index scan. Semantic arms over-fetch 5× (min 100) to compensate for HNSW approximation loss; results are trimmed tolimitin Python. A single connection/round-trip is preserved (no parallel connections per recall unlike the approach in PR #540).2.
hnsw.ef_search=200at pool init (memory_engine.py)Set once per connection via
asyncpg.create_pool(init=...)— no per-query SET/RESET overhead. Gracefully ignored if the extension doesn't support it.3. Per-(bank, fact_type) partial HNSW indexes (
bank_utils.py, migrations)Two root causes prevented the planner from using existing indexes:
idx_memory_units_bank_idB-tree always wins whenbank_idis in the WHERE clause.idx_memory_units_embedding): competes with per-fact_type partials for larger partitions (world, observation) and wins.Fix: create per-(bank_id, fact_type) partial HNSW indexes (both predicates in the
WHEREclause) and drop the global index. Index naming uses a stableinternal_id UUID UNIQUEcolumn added to thebankstable.Lifecycle:
create_bank_hnsw_indexes)drop_bank_hnsw_indexes, combined withDELETE ... RETURNINGto avoid an extra round-trip)4. Migrations
a3b4c5d6e7f8: interim fact_type-only partial HNSW indexesd5e6f7a8b9c0: addsinternal_id UUID UNIQUEto banks, replaces fact_type-only indexes with per-(bank, fact_type) partials, drops global HNSW index, backfills indexes for existing banksTest plan
cd hindsight-api && uv run pytest tests/ -v— all tests passuv run hindsight-admin run-db-migrationEXPLAIN (ANALYZE, BUFFERS)on a bank with 20K+ rows that the planner selectsIndex Scan using idx_mu_emb_worl_<uid>(not Seq Scan) for each UNION ALL arm./scripts/benchmarks/run-retain-perf.shand confirm latency improvement vs baseline