Release v1.5.0 · vivekptnk/ProximaKit

Correctness fixes from a multi-agent audit (every fix reproduced before patching, re-verified after), INT8 scalar quantization, three new distance metrics, reproducible graph construction, and a CI overhaul.

Added

Multiplatform demo app: ProximaDemoApp now targets iPhone, iPad,
macOS, and visionOS from one SwiftUI target (compact widths get a
search-first tab layout; AppKit image loading replaced with ImageIO).
The persisted demo index is validated against the current embedder's
dimension before reuse — NLEmbedding can pin sentence (512d) or
word-averaging (300d) mode depending on which language assets the OS
has, and a stale-dimension index made every search silently empty.
Dimension mismatches now surface as an actionable error instead.
A -demoQuery launch argument supports screenshot automation.
INT8 scalar quantization (ADR-007). ScalarQuantizer — symmetric
per-vector scaling (scale = maxAbs / 127, explicit zero-vector handling)
— plus the ScalarQuantizedHNSWIndex actor. ~4× vector-storage reduction
(384d: 1,536 B → 388 B per vector), no training phase, and search runs
through the configured DistanceMetric, so any serialisable metric works
(contrast with PQ's L2-only ADC). Two-phase build (full-precision graph
construction, then encode), binary persistence, memory introspection
(codeStorageBytes / memorySavingsRatio), and acceptance-tested recall
floors: Recall@10 ≥ 0.95 (euclidean) / ≥ 0.93 (cosine) against brute-force
ground truth. Design rationale in
ADR-007.
Three new distance metrics: ChebyshevDistance (L∞),
BrayCurtisDistance, and MahalanobisDistance (constructible from a
covariance or inverse-covariance matrix). Chebyshev and Bray-Curtis join
DistanceMetricType and persist with any index; Mahalanobis is search-only
(not serialisable), and persistenceSnapshot() reports it as
PersistenceError.unserializableMetric rather than guessing.
HNSWConfiguration.levelSeed — seeds the layer-assignment RNG so graph
construction is reproducible: the same insertion sequence yields the same
topology. Build-time knob only; deliberately not persisted.
Persistence corruption-hardening test matrix — 42 tests across all four
binary codecs, covering truncated sections, out-of-range graph indices,
invalid entry points, and bad configuration values.
DocC published to GitHub Pages on every push to main (docs.yml),
and automatic GitHub Releases with CHANGELOG-extracted notes on version
tags (release.yml).
CI overhaul: SwiftLint job (pinned 0.63.2, strict config), iOS Simulator
build job for ProximaKit + ProximaEmbeddings, release tag/version/
changelog consistency check, benchmark regression gate wired to
compare.py, SIFT1M SHA-256 verification, and fixed SwiftPM caching.
ADRs: ADR-007 (INT8
scalar quantization — accepted),
ADR-008 (filtered search —
retrospective), ADR-010 (format
evolution policy), ADR-011 (PQ codec —
retrospective). ADR-006 moved into docs/adr/ with its siblings.

Changed

NLEmbeddingProvider sentence embeddings are now L2-normalized, matching
the word-averaging fallback path (previously only the fallback normalized).
Every vector the provider returns now has unit magnitude. Migration:
indexes persisted from pre-1.5 unnormalized sentence vectors will rank
differently under DotProductDistance/EuclideanDistance when queried with
the new unit-length vectors — re-embed and rebuild those indexes, or pin to
v1.4.x until you can. (CosineDistance users are unaffected.)
On-disk format v2. autoCompactionThreshold now survives a save/load
round-trip. Format v1 files still load — see
ADR-010 for the evolution policy.
HNSWConfiguration rejects m < 2 (m == 1 yields an infinite level
multiplier and trapped on the first add).
ProximaKit.version now reports the actual release (was stuck at 1.0.0);
a consistency test and a release-workflow check keep it that way.

Fixed

Critical: tombstone liveness is now identity-based. Liveness was
presence-based (uuidToNode[uuid] != nil), which breaks after re-adding an
existing UUID: the old tombstoned slot looked live because the UUID resolves
to the new node. Search could return stale vectors/metadata, entry-point
recovery could select a disconnected tombstone (collapsing the graph), and
compact() resurrected deleted vector bodies. Affected HNSWIndex,
QuantizedHNSWIndex, and SparseIndex; reproduced 20/20 pre-fix and locked
in by TombstoneLivenessTests.
Batch cosine zero-vector parity. The batch fast path returned distance
0 (perfect match) for zero-magnitude vectors where scalar CosineDistance
returns 1.0 (neutral) — degenerate embeddings ranked as top hits in batch
paths. Both zero-query and zero-stored-vector now return 1.0.
Store reentrancy. VectorStore.save() no longer loses concurrent
addChunks dirty-flag updates across its suspension point;
HybridVectorStore two-leg saves can no longer persist diverged
dense/sparse files; removeDocument() closed its orphan window; document-map
writes are atomic.
Persistence loaders validate before trusting. Graph indices, entry
points, levels, and configuration ranges are checked on load, throwing typed
PersistenceError instead of crashing on corrupt or hostile files.
QuantizedHNSWIndex.build no longer misaligns PQ codes/metadata when the
input contains duplicate ids; HNSW remove() now repairs dangling incoming
edges; .weightedSum fusion validates alpha ∈ [0, 1].
DefaultBM25Tokenizer dropped locale-sensitive lowercasing — tokenization
is now deterministic regardless of device locale, per its contract.
CoreMLEmbeddingProvider now conforms to EmbeddingProvider /
TextEmbedder as documented, so it plugs into VectorStore directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Changed

Fixed

Uh oh!