Skip to content

v0.7.1

Choose a tag to compare

@github-actions github-actions released this 23 Mar 21:42
· 12 commits to main since this release
b50effc

ExDataSketch v0.7.1 Release Notes

Release date: 2026-03-23

Summary

v0.7.1 is a performance and correctness release that moves hash computation
into Rust NIF batch calls, adds configurable hash functions and seeds, fixes a
quotient filter wrap-around bug, and adds merge-time hash compatibility
validation across all hash-based sketches.

What's new in v0.7.1

NIF batch hashing (94.6% memory reduction)

The update_many path for HLL, ULL, Theta, and CMS previously hashed every
item in Elixir before sending packed hashes to the NIF. For 10M items this
created ~30M+ transient heap allocations (term_to_binary + xxhash3 NIF
bigint + binary encoding per item).

v0.7.1 sends raw item binaries directly to Rust via ListIterator for
zero-copy Erlang list iteration. Rust decodes items, hashes each with xxhash3
in-loop, and updates registers -- zero per-item Elixir heap allocation beyond
the initial list.

Items Before (memory) After (memory) Reduction
100K 28 MB 1.5 MB 94.6%
1M 281 MB 15 MB 94.7%
10M 2.75 GB 148 MB 94.6%

The existing hash-based NIF path is preserved for custom :hash_fn users and
backward compatibility.

Custom hash functions and seeds (#198)

All four hash-based sketches (HLL, ULL, Theta, CMS) now accept :hash_fn and
:seed options:

# Custom seed for reproducible hashing
hll = ExDataSketch.HLL.new(seed: 42)

# Custom hash function (disables NIF batch path)
hll = ExDataSketch.HLL.new(hash_fn: fn item -> MyHash.hash(item) end)

Seed values are propagated through serialization/deserialization and validated
at merge time.

Merge hash-compatibility validation (#205)

merge/2 on HLL, ULL, Theta, and CMS now validates that both sketches use the
same hash strategy and seed. Merging sketches with different hashing
configurations raises IncompatibleSketchesError instead of producing silently
corrupted results:

a = ExDataSketch.HLL.new(seed: 1)
b = ExDataSketch.HLL.new(seed: 2)
ExDataSketch.HLL.merge(a, b)
# ** (ExDataSketch.Errors.IncompatibleSketchesError) HLL seed mismatch: 1 vs 2

Merging sketches with custom :hash_fn is explicitly rejected since hash
function compatibility cannot be verified at runtime.

Quotient filter wrap-around fix (#203, #204)

Both Pure and Rust backends had a bug in extract_all where clusters wrapping
from slot N-1 to slot 0 caused nil quotients (Pure crash) or silent data
corruption (Rust). The fix correctly tracks the current quotient across array
boundary wrap-around.

Pure backend optimizations

  • HLL update_many: Pre-aggregate map with sorted binary splice replaces
    tuple-based per-hash full-tuple copies, reducing transient allocation from
    O(n * m) to O(n + m).
  • Batch path restoration: HLL, ULL, and Theta Pure backend update_many
    restored to use chunk + batch *_update_many instead of per-item *_update.

Test infrastructure

  • 39 new tests covering deserialization edge cases, custom hash_fn paths,
    Rust-only helper functions, and merge hash-compatibility validation.
  • Configurable coverage baselines via EX_DATA_SKETCH_COVERAGE_BASELINE env
    var for separate pure-only and Rust CI coverage thresholds.
  • Coverage reporting added to test-rust CI job.
  • Hash-dependent vector tests tagged with @tag :rust_nif for correct
    pure-only CI behavior.

Closed Issues

  • #198 -- Wire :hash_fn and :seed options through HLL, ULL, Theta, CMS
  • #202 -- Move hashing into NIF batch calls via ListIterator
  • #203 -- Quotient filter Pure backend wrap-around bug
  • #204 -- Quotient filter Rust backend wrap-around bug
  • #205 -- Merge hash-compatibility validation

Installation

def deps do
  [
    {:ex_data_sketch, "~> 0.7.1"}
  ]
end

Precompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.

Upgrade Notes

  • No breaking changes from v0.7.0.
  • EXSK binaries produced by earlier v0.x releases remain fully compatible.
  • Sketches serialized without a seed default to seed 0 on deserialization,
    preserving backward compatibility.
  • The NIF batch hashing path is used automatically when no custom :hash_fn
    is set. Custom hash function users see no behavior change.
  • CMS merge/2 now validates width, depth, and counter_width explicitly
    instead of comparing full option keyword lists.

What's Next

v0.8.0 directions under consideration include Binary Fuse Filters for even
smaller static membership filters, Ribbon Filters for space-optimal static
filters, and expanded Apache DataSketches interop for cross-language sketch
exchange.

Links