v0.6.0
ExDataSketch v0.6.0 Release Notes
Release date: 2026-03-12
Summary
v0.6.0 adds two new sketch algorithms (REQ and Misra-Gries), integrates
XXHash3 as an opt-in hash function via Rust NIF, and delivers Rust NIF
acceleration for all six membership filters (Bloom, Cuckoo, Quotient, CQF,
XorFilter, IBLT). Every membership filter now has byte-identical parity between
the Pure Elixir and Rust NIF backends, verified by deterministic parity tests.
ExDataSketch now covers 16 sketch types across eight categories:
| Category | Algorithms |
|---|---|
| Cardinality | HyperLogLog (HLL) |
| Frequency | Count-Min Sketch (CMS) |
| Set operations | Theta Sketch |
| Quantiles | KLL, DDSketch, REQ |
| Frequency ranking | FrequentItems (SpaceSaving), Misra-Gries |
| Membership | Bloom, Cuckoo, Quotient, CQF, XorFilter |
| Set reconciliation | IBLT |
| Composition | FilterChain |
What's new in v0.6.0
REQ Sketch (ExDataSketch.REQ)
Relative-error quantile sketch with configurable accuracy bias. REQ provides
rank-proportional error: the relative error at rank r is bounded by
alpha * r (HRA mode) or alpha * (1-r) (LRA mode), making it ideal for
high-percentile monitoring where tail accuracy matters most.
Key features:
- High-rank accuracy (HRA) and low-rank accuracy (LRA) modes
- Configurable k parameter (accuracy vs memory tradeoff)
quantile/2,quantiles/2,rank/2queries- Merge support for distributed aggregation
- REQ1 binary state format
- EXSK serialization (sketch ID 13)
Quick start:
req = ExDataSketch.REQ.new(k: 12, hra: true)
req = ExDataSketch.REQ.update_many(req, 1..100_000)
ExDataSketch.REQ.quantile(req, 0.99) # 99th percentile with high accuracy
ExDataSketch.REQ.quantile(req, 0.999) # 99.9th percentile
ExDataSketch.REQ.rank(req, 50_000.0) # normalized rank of a valueMisra-Gries Sketch (ExDataSketch.MisraGries)
Deterministic heavy-hitter detection with guaranteed frequency thresholds.
Unlike the probabilistic FrequentItems (SpaceSaving), Misra-Gries provides
a hard guarantee: any item with true frequency exceeding n/k is tracked.
Key features:
- Deterministic guarantee: items above n/k frequency are always tracked
- Configurable key encoding:
:binary,:int,{:term, :external} estimate/2for per-item frequency lower boundstop_k/2for ranked heavy hittersfrequent/2for threshold-based filtering- Merge support
- MG01 binary state format
- EXSK serialization (sketch ID 14)
Quick start:
mg = ExDataSketch.MisraGries.new(k: 10)
mg = Enum.reduce(1..1000, mg, fn _, s -> ExDataSketch.MisraGries.update(s, "hot") end)
mg = Enum.reduce(1..100, mg, fn i, s -> ExDataSketch.MisraGries.update(s, "cold_#{i}") end)
ExDataSketch.MisraGries.top_k(mg, 5) # [{"hot", 1000}, ...]
ExDataSketch.MisraGries.estimate(mg, "hot") # 1000XXHash3 Integration (ExDataSketch.Hash)
XXHash3 is now available as an opt-in hash function, providing faster hashing
with better distribution than the default phash2-based hash. When the Rust NIF
is available, XXHash3 output is stable across platforms and OTP versions.
Key features:
xxhash3_64/1andxxhash3_64/2(with seed)- Rust NIF implementation for speed
- Automatic phash2-based fallback when NIF is unavailable
- Seeds are masked to u64 range for safe NIF interop
- Opt-in for backwards compatibility with existing serialized data
Quick start:
# Use XXHash3 as hash function for any sketch
hll = ExDataSketch.HLL.new(p: 14, hash_fn: &ExDataSketch.Hash.xxhash3_64/1)
hll = ExDataSketch.HLL.update_many(hll, ["alice", "bob", "carol"])
ExDataSketch.HLL.estimate(hll)KLL cdf/pmf and DDSketch rank
ExDataSketch.KLL.cdf/2-- cumulative distribution function at split pointsExDataSketch.KLL.pmf/2-- probability mass function at split pointsExDataSketch.DDSketch.rank/2-- normalized rank of a value
Quantiles Facade (ExDataSketch.Quantiles)
Unified API for quantile sketches. Write algorithm-agnostic code that works
with either KLL or DDSketch:
sketch = ExDataSketch.Quantiles.new(type: :kll)
sketch = ExDataSketch.Quantiles.update_many(sketch, 1..10_000)
ExDataSketch.Quantiles.quantile(sketch, 0.5)
ExDataSketch.Quantiles.count(sketch)Rust NIF Acceleration for Membership Filters
All six membership filters now have Rust NIF acceleration for batch operations.
The NIF backend is used automatically when available, with dirty scheduler
thresholds to avoid blocking normal schedulers on large inputs.
| Filter | Accelerated operations | Dirty threshold |
|---|---|---|
| Bloom | put_many, merge |
10,000 / 50,000 items |
| Cuckoo | put_many |
10,000 items |
| Quotient | put_many, merge |
10,000 / 50,000 items |
| CQF | put_many, merge |
10,000 / 50,000 items |
| XorFilter | build |
10,000 items |
| IBLT | put_many, merge |
10,000 / 50,000 items |
Both backends produce byte-identical serialized output for the same inputs,
verified by deterministic parity tests.
Benchmarks
New benchmark suites:
bench/req_bench.exs-- REQ sketch operationsbench/misra_gries_bench.exs-- Misra-Gries operationsbench/xxhash3_bench.exs-- XXHash3 NIF throughput at various data sizes
Existing membership filter benchmarks now automatically show Pure vs Rust
comparison columns when the NIF is available.
Run all benchmarks with mix bench.
Algorithm Matrix
| Algorithm | Purpose | EXSK ID | Backend |
|---|---|---|---|
| HLL | Cardinality estimation | 1 | Pure + Rust |
| CMS | Frequency estimation | 2 | Pure + Rust |
| Theta | Set operations | 3 | Pure + Rust |
| KLL | Rank/quantile/cdf/pmf | 4 | Pure + Rust |
| DDSketch | Value-relative quantiles/rank | 5 | Pure + Rust |
| FrequentItems | Probabilistic heavy-hitter detection | 6 | Pure + Rust |
| Bloom | Membership testing | 7 | Pure + Rust |
| Cuckoo | Membership with deletion | 8 | Pure + Rust |
| Quotient | Membership with deletion/merge | 9 | Pure + Rust |
| CQF | Multiset membership/counting | 10 | Pure + Rust |
| XorFilter | Static membership testing | 11 | Pure + Rust |
| IBLT | Set reconciliation | 12 | Pure + Rust |
| REQ | Relative-error quantiles | 13 | Pure |
| MisraGries | Deterministic heavy hitters | 14 | Pure |
| FilterChain | Filter composition | -- | Pure |
| Quantiles | Quantile facade (KLL/DDSketch) | -- | -- |
Installation
def deps do
[
{:ex_data_sketch, "~> 0.6.0"}
]
endPrecompiled Rust NIF binaries are downloaded automatically on supported
platforms (macOS ARM64/x86_64, Linux x86_64/aarch64 glibc/musl). No Rust
toolchain required. The library works in pure Elixir mode on all other
platforms.
Upgrade Notes
- No breaking changes from v0.5.0.
- EXSK binaries produced by earlier v0.x releases remain fully compatible.
- The
ExDataSketch.Backendbehaviour now includes additional callbacks for
REQ and Misra-Gries. Custom backend implementations must add these callbacks. ExDataSketch.Quantilesno longer includes REQ in the facade (KLL and
DDSketch only). REQ is accessed directly viaExDataSketch.REQ.- Membership filter NIF acceleration is automatic and transparent. No code
changes are needed to benefit from the Rust backend. - The
Quantiles.quantiles/2typespec was tightened from
[float() | nil] | nilto[float() | nil].
Rust NIF Safety Hardening
This release includes several safety improvements to the Rust NIF layer:
- All recursive traversals in quotient filter slot arithmetic converted to
bounded iterative loops to prevent stack overflow shift_rightbounded to at mostslot_countiterations to prevent infinite
loops on full/corrupt filter state- Binary header validation before slicing in all NIF entry points to prevent
out-of-bounds panics - Parameter validation (zero bucket counts, out-of-range fingerprint bits) to
prevent arithmetic overflow OwnedBinary::new()allocation failure returns error tuples instead of
panicking- XorFilter build uses deterministic deduplication (sort + dedup) and
deterministic peeling order for cross-node reproducibility - XorFilter seed retry uses wrapping addition to prevent u32 overflow panic
- MisraGries
binary_to_termuses[:safe]option to prevent atom-table
exhaustion from untrusted data
What's Next
v0.7.0 adds UltraLogLog (ULL), a next-generation cardinality estimation
sketch based on Ertl (2023). ULL delivers approximately 20% lower relative
error than HyperLogLog at the same memory footprint, with full Pure Elixir
and Rust NIF backends, EXSK serialization, and distributed merge support.