Skip to content

v0.6.0

Choose a tag to compare

@github-actions github-actions released this 13 Mar 12:18
· 42 commits to main since this release
cb619fb

ExDataSketch v0.6.0 Release Notes

Release date: 2026-03-12

Summary

v0.6.0 adds two new sketch algorithms (REQ and Misra-Gries), integrates
XXHash3 as an opt-in hash function via Rust NIF, and delivers Rust NIF
acceleration for all six membership filters
(Bloom, Cuckoo, Quotient, CQF,
XorFilter, IBLT). Every membership filter now has byte-identical parity between
the Pure Elixir and Rust NIF backends, verified by deterministic parity tests.

ExDataSketch now covers 16 sketch types across eight categories:

Category Algorithms
Cardinality HyperLogLog (HLL)
Frequency Count-Min Sketch (CMS)
Set operations Theta Sketch
Quantiles KLL, DDSketch, REQ
Frequency ranking FrequentItems (SpaceSaving), Misra-Gries
Membership Bloom, Cuckoo, Quotient, CQF, XorFilter
Set reconciliation IBLT
Composition FilterChain

What's new in v0.6.0

REQ Sketch (ExDataSketch.REQ)

Relative-error quantile sketch with configurable accuracy bias. REQ provides
rank-proportional error: the relative error at rank r is bounded by
alpha * r (HRA mode) or alpha * (1-r) (LRA mode), making it ideal for
high-percentile monitoring where tail accuracy matters most.

Key features:

  • High-rank accuracy (HRA) and low-rank accuracy (LRA) modes
  • Configurable k parameter (accuracy vs memory tradeoff)
  • quantile/2, quantiles/2, rank/2 queries
  • Merge support for distributed aggregation
  • REQ1 binary state format
  • EXSK serialization (sketch ID 13)

Quick start:

req = ExDataSketch.REQ.new(k: 12, hra: true)
req = ExDataSketch.REQ.update_many(req, 1..100_000)

ExDataSketch.REQ.quantile(req, 0.99)   # 99th percentile with high accuracy
ExDataSketch.REQ.quantile(req, 0.999)  # 99.9th percentile
ExDataSketch.REQ.rank(req, 50_000.0)   # normalized rank of a value

Misra-Gries Sketch (ExDataSketch.MisraGries)

Deterministic heavy-hitter detection with guaranteed frequency thresholds.
Unlike the probabilistic FrequentItems (SpaceSaving), Misra-Gries provides
a hard guarantee: any item with true frequency exceeding n/k is tracked.

Key features:

  • Deterministic guarantee: items above n/k frequency are always tracked
  • Configurable key encoding: :binary, :int, {:term, :external}
  • estimate/2 for per-item frequency lower bounds
  • top_k/2 for ranked heavy hitters
  • frequent/2 for threshold-based filtering
  • Merge support
  • MG01 binary state format
  • EXSK serialization (sketch ID 14)

Quick start:

mg = ExDataSketch.MisraGries.new(k: 10)
mg = Enum.reduce(1..1000, mg, fn _, s -> ExDataSketch.MisraGries.update(s, "hot") end)
mg = Enum.reduce(1..100, mg, fn i, s -> ExDataSketch.MisraGries.update(s, "cold_#{i}") end)

ExDataSketch.MisraGries.top_k(mg, 5)       # [{"hot", 1000}, ...]
ExDataSketch.MisraGries.estimate(mg, "hot") # 1000

XXHash3 Integration (ExDataSketch.Hash)

XXHash3 is now available as an opt-in hash function, providing faster hashing
with better distribution than the default phash2-based hash. When the Rust NIF
is available, XXHash3 output is stable across platforms and OTP versions.

Key features:

  • xxhash3_64/1 and xxhash3_64/2 (with seed)
  • Rust NIF implementation for speed
  • Automatic phash2-based fallback when NIF is unavailable
  • Seeds are masked to u64 range for safe NIF interop
  • Opt-in for backwards compatibility with existing serialized data

Quick start:

# Use XXHash3 as hash function for any sketch
hll = ExDataSketch.HLL.new(p: 14, hash_fn: &ExDataSketch.Hash.xxhash3_64/1)
hll = ExDataSketch.HLL.update_many(hll, ["alice", "bob", "carol"])
ExDataSketch.HLL.estimate(hll)

KLL cdf/pmf and DDSketch rank

  • ExDataSketch.KLL.cdf/2 -- cumulative distribution function at split points
  • ExDataSketch.KLL.pmf/2 -- probability mass function at split points
  • ExDataSketch.DDSketch.rank/2 -- normalized rank of a value

Quantiles Facade (ExDataSketch.Quantiles)

Unified API for quantile sketches. Write algorithm-agnostic code that works
with either KLL or DDSketch:

sketch = ExDataSketch.Quantiles.new(type: :kll)
sketch = ExDataSketch.Quantiles.update_many(sketch, 1..10_000)
ExDataSketch.Quantiles.quantile(sketch, 0.5)
ExDataSketch.Quantiles.count(sketch)

Rust NIF Acceleration for Membership Filters

All six membership filters now have Rust NIF acceleration for batch operations.
The NIF backend is used automatically when available, with dirty scheduler
thresholds to avoid blocking normal schedulers on large inputs.

Filter Accelerated operations Dirty threshold
Bloom put_many, merge 10,000 / 50,000 items
Cuckoo put_many 10,000 items
Quotient put_many, merge 10,000 / 50,000 items
CQF put_many, merge 10,000 / 50,000 items
XorFilter build 10,000 items
IBLT put_many, merge 10,000 / 50,000 items

Both backends produce byte-identical serialized output for the same inputs,
verified by deterministic parity tests.

Benchmarks

New benchmark suites:

  • bench/req_bench.exs -- REQ sketch operations
  • bench/misra_gries_bench.exs -- Misra-Gries operations
  • bench/xxhash3_bench.exs -- XXHash3 NIF throughput at various data sizes

Existing membership filter benchmarks now automatically show Pure vs Rust
comparison columns when the NIF is available.

Run all benchmarks with mix bench.

Algorithm Matrix

Algorithm Purpose EXSK ID Backend
HLL Cardinality estimation 1 Pure + Rust
CMS Frequency estimation 2 Pure + Rust
Theta Set operations 3 Pure + Rust
KLL Rank/quantile/cdf/pmf 4 Pure + Rust
DDSketch Value-relative quantiles/rank 5 Pure + Rust
FrequentItems Probabilistic heavy-hitter detection 6 Pure + Rust
Bloom Membership testing 7 Pure + Rust
Cuckoo Membership with deletion 8 Pure + Rust
Quotient Membership with deletion/merge 9 Pure + Rust
CQF Multiset membership/counting 10 Pure + Rust
XorFilter Static membership testing 11 Pure + Rust
IBLT Set reconciliation 12 Pure + Rust
REQ Relative-error quantiles 13 Pure
MisraGries Deterministic heavy hitters 14 Pure
FilterChain Filter composition -- Pure
Quantiles Quantile facade (KLL/DDSketch) -- --

Installation

def deps do
  [
    {:ex_data_sketch, "~> 0.6.0"}
  ]
end

Precompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.

Upgrade Notes

  • No breaking changes from v0.5.0.
  • EXSK binaries produced by earlier v0.x releases remain fully compatible.
  • The ExDataSketch.Backend behaviour now includes additional callbacks for
    REQ and Misra-Gries. Custom backend implementations must add these callbacks.
  • ExDataSketch.Quantiles no longer includes REQ in the facade (KLL and
    DDSketch only). REQ is accessed directly via ExDataSketch.REQ.
  • Membership filter NIF acceleration is automatic and transparent. No code
    changes are needed to benefit from the Rust backend.
  • The Quantiles.quantiles/2 typespec was tightened from
    [float() | nil] | nil to [float() | nil].

Rust NIF Safety Hardening

This release includes several safety improvements to the Rust NIF layer:

  • All recursive traversals in quotient filter slot arithmetic converted to
    bounded iterative loops to prevent stack overflow
  • shift_right bounded to at most slot_count iterations to prevent infinite
    loops on full/corrupt filter state
  • Binary header validation before slicing in all NIF entry points to prevent
    out-of-bounds panics
  • Parameter validation (zero bucket counts, out-of-range fingerprint bits) to
    prevent arithmetic overflow
  • OwnedBinary::new() allocation failure returns error tuples instead of
    panicking
  • XorFilter build uses deterministic deduplication (sort + dedup) and
    deterministic peeling order for cross-node reproducibility
  • XorFilter seed retry uses wrapping addition to prevent u32 overflow panic
  • MisraGries binary_to_term uses [:safe] option to prevent atom-table
    exhaustion from untrusted data

What's Next

v0.7.0 adds UltraLogLog (ULL), a next-generation cardinality estimation
sketch based on Ertl (2023). ULL delivers approximately 20% lower relative
error than HyperLogLog at the same memory footprint, with full Pure Elixir
and Rust NIF backends, EXSK serialization, and distributed merge support.

Links