Skip to content

TLS write deadlock when CT-out BIO ring < response body #29

@EdmondDantes

Description

@EdmondDantes

Summary

When TLS_BIO_RING_SIZE (CT-out, internal_bio writebuf) is smaller than what the response emit pass tries to push, the connection hangs: client receives 200 + 0 body bytes and waits until timeout. With the current default 64 KiB the bug is latent — one SSL_write fits 3–4 records of 16 KiB plaintext each, so most responses succeed in one go. Reducing the ring exposes it on any body > one record.

Reproduction

  1. Edit src/core/tls_layer.c so the ciphertext BIO pair uses the small ring on both sides:
    BIO_new_bio_pair(&internal_bio, TLS_BIO_RING_SIZE_SMALL,
                     &network_bio,  TLS_BIO_RING_SIZE_SMALL)
  2. Build and start the server.
  3. curl --http2 -k -o /dev/null https://localhost:8443/static/app.js against a 200 KiB body.
  4. Result: code=200 size=0 time=<curl --max-time>. reset.css (8 KiB) works.

Root cause

Two write paths bypass the tls_plaintext_bio queue and call SSL_write directly:

  • h2_tls_write_chunk in src/http2/http2_strategy.c:1150 — h2 emit
  • http_connection_tls_fsm_send_plaintext_atomic in src/core/http_connection_tls.c:498 — h1 sendfile (also h1 parse-error reply)

Both treat TLS_IO_WANT_WRITE (wbio full) as failure and return false. The undelivered tail (h2 stage[] + offset, or the h1 chunk) is discarded; the caller does not re-enter when the wbio drains. The existing tls_drain retry mechanism (cipher_completion → tls_advance_state → tls_drain) only services tls_plaintext_bio — it cannot pick up bytes that never went through that queue.

tls_write_plaintext itself was already correctly returning wrote only on rc == 1 (all-or-nothing without SSL_MODE_ENABLE_PARTIAL_WRITE). PARTIAL_WRITE is set at SSL_CTX_set_mode, so partial writes are possible — but the callers above still throw the unwritten remainder away.

Fix sketch (~150–200 LoC)

  1. h2 emit: when h2_tls_write_chunk hits WANT_WRITE, stash the undelivered slice (stage[] + offset, or a heap copy if the staging buffer is on the call stack). Hook tls_zc_write_done_cb to resume the same write loop after the cipher write to libuv completes. Return success so the upper layer treats the conn as healthy.
  2. h1 sendfile: rename *_atomic to make non-atomic semantics explicit, do the same tls_zc_write_done_cb parking on WANT_WRITE, resume on completion. The existing h1 sendfile state machine already has a H1_SEND_PHASE_TLS_DRAIN phase — reuse it.
  3. Tests: a phpt under tests/phpt/server/tls/ that lowers TLS_BIO_RING_SIZE via build flag and verifies a body > one record completes.

Impact

Until fixed, TLS_BIO_RING_SIZE (CT-out) must stay ≥ the largest single SSL_write the emit paths can produce. Empirically 64 KiB is enough for current h2 / h1 sendfile workloads. Once the retry path lands, the ring can shrink (e.g. to 17 KiB) and save ~94 KiB per TLS connection (the asymmetric change in v0.6.0 already drops the unused half — this would drop the in-use half too).

Related

  • Asymmetric BIO sizing landed in v0.6.0 (CT-in / PT-app back-channel 17 KiB) — see CHANGELOG.
  • Double-destroy fix in v0.6.0 is unrelated but found in the same investigation.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions