v0.7.1
ExDataSketch v0.7.1 Release Notes
Release date: 2026-03-23
Summary
v0.7.1 is a performance and correctness release that moves hash computation
into Rust NIF batch calls, adds configurable hash functions and seeds, fixes a
quotient filter wrap-around bug, and adds merge-time hash compatibility
validation across all hash-based sketches.
What's new in v0.7.1
NIF batch hashing (94.6% memory reduction)
The update_many path for HLL, ULL, Theta, and CMS previously hashed every
item in Elixir before sending packed hashes to the NIF. For 10M items this
created ~30M+ transient heap allocations (term_to_binary + xxhash3 NIF
bigint + binary encoding per item).
v0.7.1 sends raw item binaries directly to Rust via ListIterator for
zero-copy Erlang list iteration. Rust decodes items, hashes each with xxhash3
in-loop, and updates registers -- zero per-item Elixir heap allocation beyond
the initial list.
| Items | Before (memory) | After (memory) | Reduction |
|---|---|---|---|
| 100K | 28 MB | 1.5 MB | 94.6% |
| 1M | 281 MB | 15 MB | 94.7% |
| 10M | 2.75 GB | 148 MB | 94.6% |
The existing hash-based NIF path is preserved for custom :hash_fn users and
backward compatibility.
Custom hash functions and seeds (#198)
All four hash-based sketches (HLL, ULL, Theta, CMS) now accept :hash_fn and
:seed options:
# Custom seed for reproducible hashing
hll = ExDataSketch.HLL.new(seed: 42)
# Custom hash function (disables NIF batch path)
hll = ExDataSketch.HLL.new(hash_fn: fn item -> MyHash.hash(item) end)Seed values are propagated through serialization/deserialization and validated
at merge time.
Merge hash-compatibility validation (#205)
merge/2 on HLL, ULL, Theta, and CMS now validates that both sketches use the
same hash strategy and seed. Merging sketches with different hashing
configurations raises IncompatibleSketchesError instead of producing silently
corrupted results:
a = ExDataSketch.HLL.new(seed: 1)
b = ExDataSketch.HLL.new(seed: 2)
ExDataSketch.HLL.merge(a, b)
# ** (ExDataSketch.Errors.IncompatibleSketchesError) HLL seed mismatch: 1 vs 2Merging sketches with custom :hash_fn is explicitly rejected since hash
function compatibility cannot be verified at runtime.
Quotient filter wrap-around fix (#203, #204)
Both Pure and Rust backends had a bug in extract_all where clusters wrapping
from slot N-1 to slot 0 caused nil quotients (Pure crash) or silent data
corruption (Rust). The fix correctly tracks the current quotient across array
boundary wrap-around.
Pure backend optimizations
- HLL
update_many: Pre-aggregate map with sorted binary splice replaces
tuple-based per-hash full-tuple copies, reducing transient allocation from
O(n * m) to O(n + m). - Batch path restoration: HLL, ULL, and Theta Pure backend
update_many
restored to use chunk + batch*_update_manyinstead of per-item*_update.
Test infrastructure
- 39 new tests covering deserialization edge cases, custom
hash_fnpaths,
Rust-only helper functions, and merge hash-compatibility validation. - Configurable coverage baselines via
EX_DATA_SKETCH_COVERAGE_BASELINEenv
var for separate pure-only and Rust CI coverage thresholds. - Coverage reporting added to
test-rustCI job. - Hash-dependent vector tests tagged with
@tag :rust_niffor correct
pure-only CI behavior.
Closed Issues
- #198 -- Wire
:hash_fnand:seedoptions through HLL, ULL, Theta, CMS - #202 -- Move hashing into NIF batch calls via ListIterator
- #203 -- Quotient filter Pure backend wrap-around bug
- #204 -- Quotient filter Rust backend wrap-around bug
- #205 -- Merge hash-compatibility validation
Installation
def deps do
[
{:ex_data_sketch, "~> 0.7.1"}
]
endPrecompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.
Upgrade Notes
- No breaking changes from v0.7.0.
- EXSK binaries produced by earlier v0.x releases remain fully compatible.
- Sketches serialized without a seed default to seed 0 on deserialization,
preserving backward compatibility. - The NIF batch hashing path is used automatically when no custom
:hash_fn
is set. Custom hash function users see no behavior change. - CMS
merge/2now validates width, depth, and counter_width explicitly
instead of comparing full option keyword lists.
What's Next
v0.8.0 directions under consideration include Binary Fuse Filters for even
smaller static membership filters, Ribbon Filters for space-optimal static
filters, and expanded Apache DataSketches interop for cross-language sketch
exchange.