v1.4.0 — Hybrid retrieval + cross-library benchmarks
Hybrid BM25 + dense retrieval and a cross-library benchmark harness (FAISS + ScaNN). The core ProximaKit target remains Foundation + Accelerate only — no new external dependencies.
Added
-
Cross-library benchmark harness (
Benchmarks/). Standalone SPM package
ProximaBenchthat compares ProximaKit HNSW against FAISS HNSW and ScaNN
on identical datasets and identical brute-force ground truth. The core
ProximaKittarget stays dependency-free — baselines run in Python and
all harnesses write a flat JSON schema (seeBenchmarks/JSON_SCHEMA.md).- Swift subcommands:
ground-truth(exact k-NN viaBruteForceIndex)
andhnsw(build + timed search + recall@k against GT). - Python baselines under
Benchmarks/python/:faiss_hnsw.py,
scann_hnsw.py(auto-skips on unsupported platforms),compare.py
aggregator that emits a Markdown table. - Datasets: SIFT1M 100K subset + MS MARCO passages 50K (MiniLM-L6-v2
embeddings). Idempotent download scripts underBenchmarks/datasets/. - Metrics: recall@10 vs exact GT, p50/p95 query latency, QPS, build time,
resident memory (mach_task_basic_infoon Swift,psutilon Python).
- Swift subcommands:
-
docs/BENCHMARKS.md— "Cross-Library Comparison" section with
design rules, dataset table, metrics table, and end-to-end reproduction
steps that call the harness binaries directly. -
docs/adr/ADR-005-benchmark-methodology.mddocumenting why the
baselines live out-of-process and whyBenchmarks/is a separate SPM
package rather than a target ofPackage.swift. -
CI:
.github/workflows/benchmark.yml. Smoke slice (SIFT1M 10K) runs
on every PR that touchesSources/ProximaKit/**or the harness. Full
slice (100K) runs nightly. Results (per-library JSON + aggregated
compare.md) are uploaded as workflow artifacts. -
Hybrid retrieval (BM25 + dense). Three new public types in the core
ProximaKittarget, sibling to the existing dense-only stack:SparseIndex— BM25 actor (SparseVectorIndexprotocol), Okapi scoring
with Lucene-stylelog(1 + (N − df + 0.5) / (df + 0.5))IDF, configurable
k1/b, tombstoning + auto-compaction matchingHNSWIndex.HybridIndex— concurrent fan-out over a denseVectorIndexand a
SparseVectorIndex, withHybridFusionStrategy=.rrf(k:)(default,
k = 60) or.weightedSum(alpha:).HybridVectorStore— sibling ofVectorStorewith the same
addChunks/query/removeDocument/saveshape. Persists both
legs side-by-side (index.pxkt+index.pxbm).
-
BM25Tokenizerprotocol withDefaultBM25Tokenizer— Unicode word-break
segmentation + lowercasing, no NaturalLanguage dependency. Bring-your-own
tokenizer for language-aware tokenization (e.g. Lumen'sNLTokenizer). -
BM25Configurationwithk1,b,autoCompactionThresholdknobs. -
.pxbmbinary persistence forSparseIndexvia an extension on
PersistenceEngine. Same header / offset layout conventions as
.pxkt; compacts tombstones on save. -
docs/HYBRID.md— hybrid retrieval design, fusion-strategy rationale,
Lumen opt-in snippet. -
40 new tests across
SparseIndexTests,DefaultBM25TokenizerTests,
HybridIndexTests, andHybridVectorStoreTests, including a 1K-doc BM25
parity check against an oracle implementation and the RRF
top-k ⊇ (dense ∩ sparse)invariant on constructed cases.
Changed
.gitignorenow tracksBenchmarks/sources but ignores the on-demand
Benchmarks/datasets/payloads andBenchmarks/out/run artifacts.docs/ADR-006-lumen-integration.md— new addendum covering the hybrid
opt-in path. The v1.1VectorStorecontract is unchanged.
Fixed
SparseIndexTests.testBM25ParityAgainstOracleno longer flakes when BM25
score ties straddle the top-k truncation boundary. Both the oracle and
SparseIndexare queried withk + 50and the assertion walks fully
realized score buckets until it covers the top-k window — BM25 makes no
tie-break guarantee, so the test now verifies only what parity actually
demands (score agreement across the top-k window).