feat: distributed state gap closure — 14 items#89
Merged
Conversation
Implements the post-gap-analysis roadmap items (all except Admin CLI, deferred to separate work): Proto & wire format: - Add ChunkRequest/Response, SnapshotRequest/Entry/Response, Heartbeat messages to state_transport.proto - Add hlc_time field to Mutation (proto field 12) - Extend Frame oneof with 5 new fields - Add snapshot & chunk wire types to IPC transport (kMsgSnapReq/Rsp) Transport chunk fetch: - IPC transport: set_blob_store, fetch_chunk, chunk request/response - gRPC transport: set_blob_store, fetch_chunk, chunk request/response - Both use condition_variable waiter pattern with 10s timeout Session improvements: - Wire state_data_hydrator into distributed_state_session::join() - Add TLS/auth config passthrough (gRPC only) - Add blob store wiring on both IPC and gRPC transports - Add wait_for_data() delegating to hydrator - Add replica_status status() with peer_count, local_sequence, etc. Compression: - Add state_zstd_compression_codec (production zstd via libzstd) - Register in shared() singleton; update tests for 3 built-in codecs - Add CMake zstd discovery (pkg-config + fallback find_library) Conflict visibility: - Add conflict_entry ring buffer to state_cluster_shard (128 entries) - Add recent_conflicts() accessor (most-recent-first) - Add publish_conflicts() to state_distributed_metrics Hybrid logical clocks: - New state_hybrid_time.h: hybrid_time struct + hybrid_clock class - Add hlc_time field to state_mutation (packed 48-bit wall + 16-bit logical) - Update should_replace() to prefer HLC ordering when both carry timestamps - Serialize hlc_time on IPC wire (backward-compatible) and gRPC proto Hash-range sharding: - New state_hash_partition.h: FNV-1a hash ring with uniform partitioning Selective brick fetch: - Add plane struct and bricks_in_frustum() to state_brick_manifest (AABB-vs-frustum conservative test) Initial-sync snapshot: - Add snapshot() to state_cluster_shard (walks state tree via traverse) - Add request_snapshot() virtual to state_transport base class - Implement on IPC transport (kMsgSnapReq/Rsp wire protocol) Tests: - state_reconnect_resilience_test.cpp: 3 tests (stop/restart/reconnect, peer disconnect survival, connection count tracking) - state_large_tree_bench_test.cpp: 4 benchmarks (10k/100k ingest, drain+publish, snapshot) gated on CVC_DISTRIBUTED_STATE_BENCH=1
P0 — Production Blockers:
- gRPC snapshot + heartbeat frame dispatch in both server and client
- Wire hybrid_clock into shard (stamp outbound, merge inbound)
- IPC wire version negotiation (kMaxVersion, skip unknown msg types)
P1 — Functional Gaps:
- Wire hash_partition into shard with enforce_partition gate
- gRPC heartbeat send/receive (background heartbeat_loop thread)
- Session config: blob_store_path, snapshot_on_join wired
P2 — Test Coverage & Docs:
- 7 new test suites (47 tests): hash_partition, hybrid_time,
brick_frustum, conflict_ring, snapshot_protocol, zstd_round_trip,
session_status
- gRPC reconnect/resilience tests (3 tests)
- USAGE.md: distributed state section (quick start, config
reference, transports, state API, interest filters, snapshots,
conflict resolution, session lifecycle)
- CI: gRPC build matrix entry (libcvc-debug-grpc) for Linux
P3 — Nice to Have:
- require_tls safety boolean on distributed_state_config
Bug fixes:
- hash_partition owner_of() uint32 overflow (UINT32_MAX + 1 wraps)
19 files changed, +1560/-4 lines
…icts
1. Wire max_inline_payload_bytes into state_cluster_shard::drain_local():
- Add set_blob_store() and set_max_inline_payload_bytes() to shard
- In drain_local(), values exceeding the threshold are offloaded
to the blob store and replaced with a blob_ref payload
- Session::join() passes config.max_inline_payload_bytes to shard
2. Auto-publish conflict metrics in session pump loop:
- Store app* context in distributed_state_session for metrics
- Every 100 pump cycles, call publish_conflicts() to write
recent conflict entries under __system.distributed.<cluster>.conflicts.*
- Only active when config.resolve_conflicts is true
Tests:
- InlinePayloadOffloadToBlobStore: verifies large values go to blob store
- InlinePayloadOffloadDisabledByDefault: verifies no offload without blob store
- ConflictAutoPublish: verifies conflicts appear in state tree automatically
6 files changed, +169/-1 lines
…-format - Add zstd to macOS brew install lines in CI workflow - Use IMPORTED_TARGET in pkg_check_modules for zstd to get full library paths (fixes macOS linker error: 'ld: library zstd not found') - Pass steady_clock timestamp to note_seen() in state_transport_grpc.cpp (fixes grpc build: note_seen now requires 2 arguments) - Apply clang-format to all changed lines vs base c297c8a
This was referenced May 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements 14 of 16 post-gap-analysis roadmap items for the distributed state synchronization feature. Item #3 (TLS/auth) was already implemented; item #14 (Admin CLI/HTTP) is deferred to upcoming CLI/dashboard work.
Changes
Proto & Wire Format
hlc_timefield added to Mutation (field 12)Transport Chunk Fetch (IPC + gRPC)
set_blob_store(),fetch_chunk()on both transportsSession Improvements
state_data_hydratorwired intodistributed_state_session::join()wait_for_data(path, timeout)APIreplica_status status()with peer_count, local_sequence, pump_cycles, pending_hydrationsCompression
state_zstd_compression_codec(production zstd via libzstd)state_compression_registry::shared()Conflict Visibility
conflict_entryring buffer (128 entries) on shardrecent_conflicts()accessor (most-recent-first)publish_conflicts()onstate_distributed_metricsHybrid Logical Clocks
state_hybrid_time.h:hybrid_timestruct +hybrid_clockclasshlc_timefield onstate_mutation(packed 48-bit wall + 16-bit logical)should_replace()prefers HLC ordering when both mutations carry timestampsHash-Range Sharding
state_hash_partition.h: FNV-1a hash ring withassign_uniform()Selective Brick Fetch
bricks_in_frustum()onstate_brick_manifest(AABB-vs-frustum conservative test)Initial-Sync Snapshot
snapshot(path_prefix)onstate_cluster_shard(walks state tree)request_snapshot()virtual onstate_transportNew Tests
state_reconnect_resilience_test.cpp: StopRestartReconnect, PeerDisconnectNoHang, ConnectionCountTracking (3 tests)state_large_tree_bench_test.cpp: Ingest10k, Ingest100k, DrainAndPublish10k, Snapshot10k (4 bench tests, gated onCVC_DISTRIBUTED_STATE_BENCH=1)Files Changed
25 files, +1423/-6 lines
Test Results
All affected test suites pass: