Exactly-once delivery with per-sink independent checkpoints by vnvo · Pull Request #62 · vnvo/deltaforge

vnvo · 2026-03-29T20:06:04Z

Summary

Kafka transactional producer (exactly_once: true) — begin_transaction/commit_transaction per batch, producer fencing detection (SinkError::Fatal), transaction.timeout.ms=60000
NATS JetStream dedup — Nats-Msg-Id header on every message for server-side dedup within duplicate_window
Redis idempotency — idempotency_key field in every XADD payload for consumer-side dedup
Per-sink independent checkpoints — each sink advances its own checkpoint; source replays from min(sink checkpoints); fastest sink never waits for slowest
Legacy checkpoint fallback — seamless migration from pre-per-sink checkpoint format
Source::compare_checkpoints — required trait method for correct checkpoint ordering (MySQL file+pos, Postgres LSN→u64, Turso change_id)
Coordinator hardening — policy gate before commits (not after), parallel per-sink commits via join_all, partial batch flush fix (recv_many timeout)
Default max_bytes bumped 3MB → 16MB — old default silently capped batches, causing disproportionate exactly-once overhead

Benchmarks (tuned, single dev machine, Docker containers)

Source	at-least-once	exactly-once	Overhead
MySQL 10M rows	151K events/s	134K events/s	~11%
Postgres 1M rows	57K events/s	53K events/s	~7%

Chaos & Testing

--scenario exactly-once crash recovery with read_committed consumer verification
4 Kafka transactional integration tests (atomic batch, single event, producer fencing, multi-batch ordering)
Configurable drain: --drain-target, --drain-writers, --drain-timeout, --drain-max-bytes, --exactly-once
Effective pipeline config dump before each benchmark run
Partial batch flush regression test

Infrastructure

Docker Compose profile isolation: base, mysql-infra, pg-infra, kafka-infra (start/stop DeltaForge without touching infra)
Loki + Promtail log aggregation — view all container logs in Grafana
Grafana dashboard: template variables (instance/pipeline/source/sink/processor), "Checkpoints & Exactly-Once" row, transaction commit/abort panels, repeating per-pipeline detail rows
Prometheus + Loki retention set to 7 days
New metrics: deltaforge_sink_txn_commits_total, deltaforge_sink_txn_aborts_total, per-sink checkpoint status/age gauges

Docs

Performance guide updated with corrected benchmarks and max_bytes tuning guidance
Kafka sink: exactly-once section (requirements, transactional overrides, fatal errors, performance)
NATS sink: deduplication section (Nats-Msg-Id, duplicate_window)
Redis sink: idempotency section (idempotency_key field)
Sinks overview: per-sink checkpoints, delivery guarantee tiers table
Landing page: updated feature cards, benchmark numbers, meta tags
Changelog updated

Test plan

cargo test --workspace --lib — 365+ tests pass
cargo clippy --all-targets --all-features -- -D warnings — clean
cargo test -p sinks --test kafka_sink_tests -- --include-ignored — 4 new EO tests pass (requires Docker)
Drain benchmark: MySQL 134K events/s with EO, 151K without
Drain benchmark: Postgres 53K events/s with EO, 57K without
--scenario exactly-once --source mysql chaos crash recovery
--scenario all --source mysql regression (existing scenarios still pass)

Checklist

Tests pass (cargo test)
Code formatted (cargo fmt)

Kafka transactional producer (begin/commit per batch), NATS Nats-Msg-Id server-side dedup, Redis idempotency_key for consumer-side dedup. Per-sink checkpoints: each sink advances independently, source replays from min(sink checkpoints). Legacy checkpoint fallback for seamless migration. Source::compare_checkpoints trait method for correct ordering (MySQL file+pos, Postgres LSN, Turso change_id). Coordinator: policy gate before checkpoint commits, parallel per-sink commits via join_all, SinkError::Fatal for producer fencing. Benchmarks (tuned, single dev machine): - MySQL: 151K events/s (at-least-once) / 134K (exactly-once) — 11% overhead - Postgres: 57K events/s (at-least-once) / 53K (exactly-once) — 7% overhead - Default max_bytes bumped 3MB → 16MB to prevent batch byte-capping Chaos: exactly-once crash recovery scenario, configurable drain target/ writers/timeout/max_bytes, effective config dump before each run. Infra: Docker Compose profiles (base/mysql-infra/pg-infra/kafka-infra), Loki+Promtail log aggregation, Grafana dashboard with template variables (instance/pipeline/source/sink/processor), per-sink checkpoint and transaction commit/abort panels. Docs: updated performance guide, Kafka/NATS/Redis sink references, per-sink checkpoint architecture, delivery guarantee tiers.

recv_many blocked indefinitely when no events were available, preventing the select! loop's ticker branch from flushing the partial batch. Wrap recv_many in tokio::time::timeout(tick_ms) so control returns to the timer when the source goes idle. Adds regression test: test_partial_batch_flushed_by_timer sends 3 events to a coordinator with max_events=10000, keeps the channel open, and asserts the timer flushes the partial batch within max_ms.

Update pipeline_e2e bench to register commit_fn per sink (new API). Remove needless borrow in chaos harness.

vnvo added 6 commits March 29, 2026 22:15

fix: clippy lints and bench compile with per-sink commit_fn API

ab8423f

Update pipeline_e2e bench to register commit_fn per sink (new API). Remove needless borrow in chaos harness.

format fixes

5bdc69a

docs: index.html updated

916acbc

changelog updated

73145f7

vnvo merged commit 4bcfb34 into main Mar 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exactly-once delivery with per-sink independent checkpoints#62

Exactly-once delivery with per-sink independent checkpoints#62
vnvo merged 6 commits intomainfrom
exactly-once

vnvo commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vnvo commented Mar 29, 2026

Summary

Benchmarks (tuned, single dev machine, Docker containers)

Chaos & Testing

Infrastructure

Docs

Test plan

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant