Summary
When TLS_BIO_RING_SIZE (CT-out, internal_bio writebuf) is smaller than what the response emit pass tries to push, the connection hangs: client receives 200 + 0 body bytes and waits until timeout. With the current default 64 KiB the bug is latent — one SSL_write fits 3–4 records of 16 KiB plaintext each, so most responses succeed in one go. Reducing the ring exposes it on any body > one record.
Reproduction
- Edit
src/core/tls_layer.c so the ciphertext BIO pair uses the small ring on both sides:
BIO_new_bio_pair(&internal_bio, TLS_BIO_RING_SIZE_SMALL,
&network_bio, TLS_BIO_RING_SIZE_SMALL)
- Build and start the server.
curl --http2 -k -o /dev/null https://localhost:8443/static/app.js against a 200 KiB body.
- Result:
code=200 size=0 time=<curl --max-time>. reset.css (8 KiB) works.
Root cause
Two write paths bypass the tls_plaintext_bio queue and call SSL_write directly:
h2_tls_write_chunk in src/http2/http2_strategy.c:1150 — h2 emit
http_connection_tls_fsm_send_plaintext_atomic in src/core/http_connection_tls.c:498 — h1 sendfile (also h1 parse-error reply)
Both treat TLS_IO_WANT_WRITE (wbio full) as failure and return false. The undelivered tail (h2 stage[] + offset, or the h1 chunk) is discarded; the caller does not re-enter when the wbio drains. The existing tls_drain retry mechanism (cipher_completion → tls_advance_state → tls_drain) only services tls_plaintext_bio — it cannot pick up bytes that never went through that queue.
tls_write_plaintext itself was already correctly returning wrote only on rc == 1 (all-or-nothing without SSL_MODE_ENABLE_PARTIAL_WRITE). PARTIAL_WRITE is set at SSL_CTX_set_mode, so partial writes are possible — but the callers above still throw the unwritten remainder away.
Fix sketch (~150–200 LoC)
- h2 emit: when
h2_tls_write_chunk hits WANT_WRITE, stash the undelivered slice (stage[] + offset, or a heap copy if the staging buffer is on the call stack). Hook tls_zc_write_done_cb to resume the same write loop after the cipher write to libuv completes. Return success so the upper layer treats the conn as healthy.
- h1 sendfile: rename
*_atomic to make non-atomic semantics explicit, do the same tls_zc_write_done_cb parking on WANT_WRITE, resume on completion. The existing h1 sendfile state machine already has a H1_SEND_PHASE_TLS_DRAIN phase — reuse it.
- Tests: a phpt under
tests/phpt/server/tls/ that lowers TLS_BIO_RING_SIZE via build flag and verifies a body > one record completes.
Impact
Until fixed, TLS_BIO_RING_SIZE (CT-out) must stay ≥ the largest single SSL_write the emit paths can produce. Empirically 64 KiB is enough for current h2 / h1 sendfile workloads. Once the retry path lands, the ring can shrink (e.g. to 17 KiB) and save ~94 KiB per TLS connection (the asymmetric change in v0.6.0 already drops the unused half — this would drop the in-use half too).
Related
- Asymmetric BIO sizing landed in v0.6.0 (CT-in / PT-app back-channel 17 KiB) — see CHANGELOG.
- Double-destroy fix in v0.6.0 is unrelated but found in the same investigation.
Summary
When
TLS_BIO_RING_SIZE(CT-out,internal_biowritebuf) is smaller than what the response emit pass tries to push, the connection hangs: client receives200+ 0 body bytes and waits until timeout. With the current default64 KiBthe bug is latent — one SSL_write fits 3–4 records of 16 KiB plaintext each, so most responses succeed in one go. Reducing the ring exposes it on any body > one record.Reproduction
src/core/tls_layer.cso the ciphertext BIO pair uses the small ring on both sides:curl --http2 -k -o /dev/null https://localhost:8443/static/app.jsagainst a 200 KiB body.code=200 size=0 time=<curl --max-time>.reset.css(8 KiB) works.Root cause
Two write paths bypass the
tls_plaintext_bioqueue and callSSL_writedirectly:h2_tls_write_chunkinsrc/http2/http2_strategy.c:1150— h2 emithttp_connection_tls_fsm_send_plaintext_atomicinsrc/core/http_connection_tls.c:498— h1 sendfile (also h1 parse-error reply)Both treat
TLS_IO_WANT_WRITE(wbio full) as failure andreturn false. The undelivered tail (h2stage[]+ offset, or the h1 chunk) is discarded; the caller does not re-enter when the wbio drains. The existingtls_drainretry mechanism (cipher_completion → tls_advance_state → tls_drain) only servicestls_plaintext_bio— it cannot pick up bytes that never went through that queue.tls_write_plaintextitself was already correctly returningwroteonly onrc == 1(all-or-nothing withoutSSL_MODE_ENABLE_PARTIAL_WRITE). PARTIAL_WRITE is set atSSL_CTX_set_mode, so partial writes are possible — but the callers above still throw the unwritten remainder away.Fix sketch (~150–200 LoC)
h2_tls_write_chunkhitsWANT_WRITE, stash the undelivered slice (stage[]+ offset, or a heap copy if the staging buffer is on the call stack). Hooktls_zc_write_done_cbto resume the same write loop after the cipher write to libuv completes. Return success so the upper layer treats the conn as healthy.*_atomicto make non-atomic semantics explicit, do the sametls_zc_write_done_cbparking onWANT_WRITE, resume on completion. The existing h1 sendfile state machine already has aH1_SEND_PHASE_TLS_DRAINphase — reuse it.tests/phpt/server/tls/that lowersTLS_BIO_RING_SIZEvia build flag and verifies a body > one record completes.Impact
Until fixed,
TLS_BIO_RING_SIZE(CT-out) must stay ≥ the largest single SSL_write the emit paths can produce. Empirically64 KiBis enough for current h2 / h1 sendfile workloads. Once the retry path lands, the ring can shrink (e.g. to17 KiB) and save ~94 KiB per TLS connection (the asymmetric change in v0.6.0 already drops the unused half — this would drop the in-use half too).Related