Skip to content

v1.4.0 — Hybrid retrieval + cross-library benchmarks

Choose a tag to compare

@vivekptnk vivekptnk released this 09 Jun 22:16
· 25 commits to main since this release

Hybrid BM25 + dense retrieval and a cross-library benchmark harness (FAISS + ScaNN). The core ProximaKit target remains Foundation + Accelerate only — no new external dependencies.

Added

  • Cross-library benchmark harness (Benchmarks/). Standalone SPM package
    ProximaBench that compares ProximaKit HNSW against FAISS HNSW and ScaNN
    on identical datasets and identical brute-force ground truth. The core
    ProximaKit target stays dependency-free — baselines run in Python and
    all harnesses write a flat JSON schema (see Benchmarks/JSON_SCHEMA.md).

    • Swift subcommands: ground-truth (exact k-NN via BruteForceIndex)
      and hnsw (build + timed search + recall@k against GT).
    • Python baselines under Benchmarks/python/: faiss_hnsw.py,
      scann_hnsw.py (auto-skips on unsupported platforms), compare.py
      aggregator that emits a Markdown table.
    • Datasets: SIFT1M 100K subset + MS MARCO passages 50K (MiniLM-L6-v2
      embeddings). Idempotent download scripts under Benchmarks/datasets/.
    • Metrics: recall@10 vs exact GT, p50/p95 query latency, QPS, build time,
      resident memory (mach_task_basic_info on Swift, psutil on Python).
  • docs/BENCHMARKS.md — "Cross-Library Comparison" section with
    design rules, dataset table, metrics table, and end-to-end reproduction
    steps that call the harness binaries directly.

  • docs/adr/ADR-005-benchmark-methodology.md documenting why the
    baselines live out-of-process and why Benchmarks/ is a separate SPM
    package rather than a target of Package.swift.

  • CI: .github/workflows/benchmark.yml. Smoke slice (SIFT1M 10K) runs
    on every PR that touches Sources/ProximaKit/** or the harness. Full
    slice (100K) runs nightly. Results (per-library JSON + aggregated
    compare.md) are uploaded as workflow artifacts.

  • Hybrid retrieval (BM25 + dense). Three new public types in the core
    ProximaKit target, sibling to the existing dense-only stack:

    • SparseIndex — BM25 actor (SparseVectorIndex protocol), Okapi scoring
      with Lucene-style log(1 + (N − df + 0.5) / (df + 0.5)) IDF, configurable
      k1 / b, tombstoning + auto-compaction matching HNSWIndex.
    • HybridIndex — concurrent fan-out over a dense VectorIndex and a
      SparseVectorIndex, with HybridFusionStrategy = .rrf(k:) (default,
      k = 60) or .weightedSum(alpha:).
    • HybridVectorStore — sibling of VectorStore with the same
      addChunks / query / removeDocument / save shape. Persists both
      legs side-by-side (index.pxkt + index.pxbm).
  • BM25Tokenizer protocol with DefaultBM25Tokenizer — Unicode word-break
    segmentation + lowercasing, no NaturalLanguage dependency. Bring-your-own
    tokenizer for language-aware tokenization (e.g. Lumen's NLTokenizer).

  • BM25Configuration with k1, b, autoCompactionThreshold knobs.

  • .pxbm binary persistence for SparseIndex via an extension on
    PersistenceEngine. Same header / offset layout conventions as
    .pxkt; compacts tombstones on save.

  • docs/HYBRID.md — hybrid retrieval design, fusion-strategy rationale,
    Lumen opt-in snippet.

  • 40 new tests across SparseIndexTests, DefaultBM25TokenizerTests,
    HybridIndexTests, and HybridVectorStoreTests, including a 1K-doc BM25
    parity check against an oracle implementation and the RRF
    top-k ⊇ (dense ∩ sparse) invariant on constructed cases.

Changed

  • .gitignore now tracks Benchmarks/ sources but ignores the on-demand
    Benchmarks/datasets/ payloads and Benchmarks/out/ run artifacts.
  • docs/ADR-006-lumen-integration.md — new addendum covering the hybrid
    opt-in path. The v1.1 VectorStore contract is unchanged.

Fixed

  • SparseIndexTests.testBM25ParityAgainstOracle no longer flakes when BM25
    score ties straddle the top-k truncation boundary. Both the oracle and
    SparseIndex are queried with k + 50 and the assertion walks fully
    realized score buckets until it covers the top-k window — BM25 makes no
    tie-break guarantee, so the test now verifies only what parity actually
    demands (score agreement across the top-k window).