feat: zero-copy full-tunnel mux + base64 off mux thread#881
Merged
therealaleph merged 1 commit intotherealaleph:mainfrom May 8, 2026
Merged
Conversation
w0l4i
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Performance refactor of the full-tunnel mode hot data path. Two headline wins, both internal — wire protocol unchanged, no config or behavior changes.
1. Zero-copy reads via
Bytes/BytesMuttunnel_loopand the SOCKS5 UDP receive loop drop their per-iterationVec::to_vec()copies.MuxMsg::{ConnectData,Data,UdpOpen,UdpData}now carryBytesinstead ofVec<u8>/Arc<Vec<u8>>; theArc::try_unwrapdance forpending_client_datais gone (Bytes is already Arc-backed).The TCP path is threshold-based to avoid a memory regression we identified in review:
BytesMut::split().freeze()— saves the 64 KB memcpy on hot downloads.Bytes::copy_from_slice+buf.clear()— payload-sized retention, buffer reused. Without this split,bytes1.x's whole-allocation refcount would pin a full 64 KB per queued tiny read under semaphore stall (worst case ~96 MB on a backpressured tunnel).The UDP path takes the same lesson but applied earlier: a fixed
Vec<u8>recv buffer +Bytes::copy_from_sliceonly after the 9 KB size guard. We triedrecv_buf_from+splitfirst; it pinned the full ~65 KB datagram allocation behind every queued DNS reply.2. Base64 encoding moved off the single mux thread
A new internal
PendingOp { data: Option<Bytes>, encode_empty: bool }flows throughmux_loopwith raw bytes. The actualB64.encode(...)happens infire_batch's spawned task, after the per-deployment semaphore permit. Up to ~3 MB of encoding per batch (50 ops × 64 KB) no longer serializes the single mux task.3. Code quality (drive-bys)
BatchAccum::push_or_firecollapses four near-identical match arms (~25 lines each) into ~10 lines each.should_fire(pending_len, payload_bytes, op_bytes)predicate extracted from the inline cap check, withsaturating_addso the helper's contract is self-contained instead of relying on caller-side bounds.encode_pending(p) -> BatchOpextracted as a free function so the encoding contract — non-empty data → encoded, emptyconnect_data→Some(""), anything else empty →None— is directly testable.Public API change
TunnelMux::udp_openandudp_datanow takedata: impl Into<Bytes>instead ofVec<u8>. Existing callers passingVec<u8>,&'static [u8],Bytes, orBytesMutall keep compiling; no boundary change for in-tree consumers.Test plan
cargo build --bins --libcleancargo test --libpasses — 208/208 (was 200, +8 new tests)encode_pending_*× 4: non-empty data → base64; emptyData/UdpData/Close→domitted; emptyconnect_data→d: ""; non-emptyconnect_data→ encodedshould_fire_*× 3: first-op-never-fires;MAX_BATCH_OPSboundary; payload-cap boundarybatch_accum_reindexes_after_flush: post-flush reply indices restart at 0 (regression test forfire_batch'sbatch_resp.r.get(idx)lookup)