perf(h2,tls): hybrid emit selector — DRAIN small bodies, GATHER large (#30) by EdmondDantes · Pull Request #32 · true-async/server

EdmondDantes · 2026-05-19T10:16:24Z

Summary

New hybrid TLS emit selector for HTTP/2 sessions: small responses take the DRAIN path (mem_send + BIO_write, no gather alloc churn), bodies ≥ 2 KiB or streaming take the GATHER path (NO_COPY refs + one SSL_write_ex).
Streams are pinned on a per-session counter at submit time; emit selects per-pass based on whether any large stream is in flight.
TRUE_ASYNC_H2_TLS_EMIT_MODE env override for A/B testing (`drain` / `gather` / `hybrid` default), read once and cached.
docs/H2_TLS_EMIT_STRATEGIES.md describes the three paths and the arithmetic that picks between them.

Bench (release PHP, h2 TLS, c=100 m=32, h2load -t 1, 10s × N median)

body	gather	drain	hybrid	best
dyn 3B	162k	235k	243k	hybrid
dyn 16K	58k	43k	57k	hybrid
dyn 64K	18k	11k	18k	hybrid
static 100B	125k	146k	145k	drain ≈ hybrid
static 4K	83k	76k	~83k	gather, hybrid catches it via threshold
static 16K	55k	40k	61k	hybrid
static 64K	17k	12k	17k	hybrid

Profile (perf record -F 999 -g, static 4K): gather lowers `memmove` from 8.6% → 6.8% (one body memcpy vs two in DRAIN) at the cost of +1.1pp `_emalloc` for the gather scratch — net 0.7pp CPU translates to ~10% RPS.

Test plan

`make` clean rebuild — green
`tests/phpt` full suite — 165/170 PASS, 4 pre-existing FAIL (TCP fragmentation, ThreadPool / bootloader, static-workers — all fail identically on parent commit `2b4e3e4`), 1 SKIP
h2load alternating drain/hybrid/gather sanity, dynamic 3B / 16K / 64K — hybrid best-of-three

Notes

kTLS Phase 2 moved out of this PR into issue Add kernel kTLS support for zero-copy TLS data path #31 + branch `31-add-kernel-ktls-support-for-zero-copy-tls-data-path`. This PR is pure Phase 1: maxing out the existing memory-BIO async transport.

Documents the two HTTP/2 TLS emit paths and the per-pass selector that sits between them, with the per-strategy memcpy / allocation arithmetic and the bench numbers driving the threshold. Companion to the Phase 1 implementation work on the same issue.

Two TLS emit paths now coexist behind an adaptive selector: DRAIN — drain nghttp2 via mem_send into a 16 KiB stack buffer and BIO_write straight into the plaintext BIO. No records[] / body_refs[] gather machinery, no per-pass emalloc churn. Wins on short responses where alloc/zval_ptr_dtor cost dominates. GATHER — drive nghttp2 via session_send + NO_COPY callbacks, fold frames into records[] (with body_refs[] keeping bodies alive), then memcpy everything into stage[] and ship with one SSL_write_ex. Wins on bodies that fill at least one TLS record (amortises cipher setup; only one memcpy of the body instead of two — mem_send + BIO_write). Selector lives on http2_session_t::large_streams_pending. Each submit site (dynamic submit_response / submit_response_streaming, static buffered + streaming submit) pins the counter when the response body exceeds H2_TLS_HYBRID_LARGE_THRESHOLD (2 KiB); cb_on_stream_close unpins. Streaming responses with unknown total size are pessimistically treated as large. http2_session_emit takes DRAIN while the counter is zero, GATHER otherwise. Override the selector with TRUE_ASYNC_H2_TLS_EMIT_MODE = drain | gather | hybrid (default) for A/B testing; env is read once and cached. Bench (release PHP, h2 TLS, c=100 m=32, h2load -t 1, 10s × N median): body gather drain hybrid static 100B 125k 146k 145k drain win (~17%) static 1K 111k 120k ~120k drain win (~9%) static 4K 83k 76k ~83k gather win (~10%) static 16K 55k 40k 61k gather win static 64K 17k 12k 17k gather win dyn 3B 204k 264k 268k drain win dyn 16K 70k 54k 75k gather win dyn 64K 20k 13k 19k gather win Profile diff at static 4K (perf record -F999 -g): gather lowers memmove from 8.57% to 6.75% (one body memcpy vs two in DRAIN), at the cost of +1.14pp _emalloc for the gather scratch arrays — net −0.7pp CPU translates to the ~10% RPS win. phpt: server/h2 26/26, server/static+tls 27/28 (pre-existing 004-static-workers failure, unrelated).

github-actions · 2026-05-19T10:22:47Z

Coverage

Total lines: 77.12% → 77.16% (+0.04 pp)

File	Baseline	Current	Δ	Touched
`src/http2/http2_session.c`	86.03%	87.14%	+1.12 pp	●
`src/http2/http2_static_response.c`	71.72%	71.67%	-0.04 pp	●
`src/http2/http2_strategy.c`	72.22%	72.54%	+0.32 pp	●
`src/http3/http3_callbacks.c`	79.21%	78.61%	-0.59 pp

EdmondDantes added 2 commits May 19, 2026 07:15

EdmondDantes linked an issue May 19, 2026 that may be closed by this pull request

Performance opt 2 #30

Closed

4 tasks

EdmondDantes merged commit 8a8a4cc into main May 19, 2026
5 checks passed

EdmondDantes deleted the 30-performance-opt-2 branch May 19, 2026 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(h2,tls): hybrid emit selector — DRAIN small bodies, GATHER large (#30)#32

perf(h2,tls): hybrid emit selector — DRAIN small bodies, GATHER large (#30)#32
EdmondDantes merged 2 commits into
mainfrom
30-performance-opt-2

EdmondDantes commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EdmondDantes commented May 19, 2026

Summary

Bench (release PHP, h2 TLS, c=100 m=32, h2load -t 1, 10s × N median)

Test plan

Notes

Uh oh!

github-actions Bot commented May 19, 2026

Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant