Conversation
- Add postgres support to backlog drain benchmark (run_pg) - Add Config Lab tab with A/B comparison runner and presets - Add PATCH proxy endpoint to chaos UI backend - Fix pg_writer_loop: reconnect on error, f64->numeric type mismatch - Fix escape_like: escape underscore, convert glob * to SQL % - Promote writer errors from debug to warn level - Add df_base/pipeline fields to SoakSource
Source hot-path optimizations (profiler-guided): - fast_uuid_v7: atomic counter replaces getrandom syscall (3.1% -> 0.2%) - Remove #[instrument] from read_next_event/dispatch_event - try_send with async fallback in send_event - Arc<Vec<RelationColumn>> and Arc<str> qualified_name per relation - Arc<PostgresTableSchema> in LoadedSchema cache - Static PG_VERSION LazyLock, hand-written checkpoint JSON - Pre-allocated batch vector in coordinator Chaos UI: - Config Lab tab with A/B comparison runner and presets - Service refresh button (--no-deps --force-recreate) - Dynamic port/URL link badges from docker port - Grafana in-flight panel color thresholds (green=draining, orange=buffering) - Wider services card layout
- Update pgwire-replication to v0.3 (buffered WAL reads) - Adapt Io error handling for Arc<io::Error> (use err.kind() not string match) - Include ErrorKind in connect error details - Fix clippy: range patterns, useless format!
Zero-copy tuple parsing: - PgColumnValue::Text/Binary now hold Bytes slices (not String/Vec) - parse_tuple_data uses Bytes::slice() instead of copy + from_utf8_lossy - DML handlers pass payload_bytes for zero-copy slicing Counter cache: - RunCtx.counter_cache caches metrics::Counter per (table, op) - send_event does entry().or_insert_with_key() - no hash lookup per event LSN cache: - RunCtx.cached_lsn avoids reformatting same LSN within a transaction - make_checkpoint_meta_str accepts pre-formatted LSN string Results (direct connection, 1M rows): - Postgres: 48.5K avg, 52.1K peak (was 29.4K baseline) - Now matches MySQL throughput (50K avg, 53.2K peak)
- Add deltaforge_source_bytes_total and deltaforge_bytes_total counters - Add per-scenario proxy bypass (--no-proxy) with pipeline DSN patching - Redesign Grafana dashboard with collapsible rows, multi-pipeline support - Add proxy toggle and service refresh to chaos UI - Add toxiproxy proxy_states/proxy_summary helpers
Reuse connections via outer/inner loop pattern and use multi-row INSERT statements (64 rows per statement) to reduce round trips during backlog population.
Prevents linger wait from capping throughput on small coordinator batches. With linger.ms=20, each 200-event batch waited the full 20ms before rdkafka sent — limiting throughput to ~8K events/s.
Decouple event accumulation from sink delivery using a bounded mpsc channel (capacity = max_inflight). A dedicated delivery task processes batches in FIFO order, preserving checkpoint ordering while overlapping accumulation with delivery.
- Drain defaults: max_events=4000, linger.ms=0, max_inflight=4 - Add --drain-max-inflight CLI arg - MySQL writer: 64-row multi-INSERT with connection reuse - Remove pool.disconnect() deadlock
- Migrate Kafka from cp-kafka:7.5.0+Zookeeper to cp-kafka:7.7.1 KRaft - Upgrade pgwire-replication to 0.3.1 (drain-phase tight loop, reusable read buffer — +32% Postgres CDC throughput)
- Activity bar with per-button loading state and task history - Unified console log with smart auto-scroll - Docker image selector dropdown with tag/size/age metadata - Stale image badge when container uses outdated image ID - Remove zookeeper from UI service list
- Fix test_build_pattern_query to use glob * instead of literal % - Replace fixed sleep(10s) with connection retry loop in failover tests - Restore Dockerfile.debug --locked flag after local pgwire dev cycle - Update changelog with throughput optimizations
- New performance.md with benchmark results, tuning parameters, source-specific notes, profiling instructions, and drain benchmark usage - Update development.md with current drain defaults, pg-soak profile, expanded playground UI features
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DEFAULT_LINGER_MSlowered from 20ms to 5ms; was the primary throughput bottleneck capping small batches at ~8K events/smax_inflightsetting decouples accumulation from delivery via bounded channel, preserving checkpoint orderinglinger.ms=0,max_events=4000)docs/src/performance.mdWhat changed
Throughput
DEFAULT_LINGER_MS: 20 → 5 (steady-state), drain useslinger.ms=0max_inflight(default 1, drain uses 4)max_events=4000,max_inflight=4,linger.ms=0Infrastructure
Chaos UI
source_bytes_total,bytes_total)--no-proxy)Fixes
pool.disconnect()deadlockescape_liketest using glob*instead of literal%sleep(10s)--lockedafter local pgwire-replication dev cycleDocs
Test plan
cargo test --workspace— all unit tests pass (119 in sources)cargo test -p sources --test failover_e2e -- --include-ignored --test-threads=1— integration tests stable with retry loop--drain-max-events 4000 --drain-kafka-conf linger.ms=0-> ~117K events/sdocker compose up -d)Checklist
cargo test)cargo fmt)