v0.29.0 — Sprint 30 close (Tier 1 only)
Sprint 30 close — event-loop integrity under hostile / burst load
(Tier 1 only). Supersedes the 0.29.0a1 alpha pre-release: the
custom-protocol path (Tier 1.5, PRs #36 / #37 / #38) shipped in a1
behind BB_USE_CUSTOM_PROTOCOL=False was reverted before the
final after the EC2 cross-check showed it regressed client-side
latency by ~9 % (p50 189 → 207 ms) and throughput by ~8 % at c=4096
on c7i.2xlarge. The code is parked on the
Sprint30-tier1.5-custom-protocol branch for future revisit; it is
not in this release in any form. See Notes for adopters below
for migration guidance from a1.
Added
BB_WRITE_TIMEOUT(default 30 s,0disables) — bounds the
time spent inStreamWriter.drain()waiting for the kernel send
buffer to flush. Defends against the slow-read shape of
slowloris: a client that reads the response 1 byte/sec eventually
fills the kernel send buffer and the server's drain blocks
indefinitely without this timeout. On timeout the transport is
force-closed and the failure surfaces as a peer-side
ConnectionResetErrorfor the sender's existing error path.
(PR #33)BB_MAX_CONNECTIONSgraceful 503 response — when the cap is
reached, new connections now receive HTTP/1.1503 Service UnavailablewithRetry-After: 1before close. Previously the
rejection path silently closed the socket, which load-balancers
interpret as a server crash. ALPN-h2 connections still close
without writing (no SETTINGS exchange yet for clean GOAWAY).
(PR #35)
Changed
BB_KEEP_ALIVE_TIMEOUTdefault lowered from60to5seconds.
Aligns with the industry-standard short-idle default (uvicorn,
granian, Caddy, Apache, Gonet/http— all 5 s; gunicorn 2 s).
60 s was a long-standing outlier that parked ghost / idle
connection tasks in the loop'sreaduntilfor far longer than
necessary, inflating suspended-task count and amplifying drain
time on burst-close. Behaviour change: clients that pause5 s between requests on a keep-alive connection will be closed
and must reopen. SetBB_KEEP_ALIVE_TIMEOUT=60to restore the
prior default. (PR #34)BB_MAX_CONNECTIONSdefault raised from0(disabled) to
1024per worker. Unbounded per-worker concurrency lets a
single client, burst, or slowloris-class workload park thousands
of suspended-readuntil tasks on the event loop, amplifying drain
time on burst-close and inflating worst-case latency. 1024 is
the typical ceiling for a single asyncio loop; multi-worker
servers multiply the ceiling (workers × max_connections).
Behaviour change: deployments accepting >1024 concurrent
connections per worker now see HTTP/1.1 503 once the cap is
reached. SetBB_MAX_CONNECTIONS=0to restore unbounded.
(PR #35)
Fixed
AsyncioWriter.close()no longer awaitswait_closed(). The
synchronousself._sw.close()already initiates the TCP shutdown
and schedules the transport'sconnection_lostcallback. Awaiting
wait_closed()afterwards serialised our connection-actor
coroutine with full transport-close completion, adding 1-3
event-loop turns per connection. Under burst-keepalive workloads
(HttpArenastaticat c=4096) those extra turns multiplied into
multi-second drains that monopolised the loop and degraded
throughput on back-to-back wrk runs. (PR #32)ConnectionActor.rundrops redundantasyncio.TaskGroupwrap.
Both HTTP/1.1 (HTTP1Actor) and HTTP/2 (HTTP2Actor) run their
protocol-specific logic without spawning sibling tasks at this
level; HTTP/2 manages per-stream tasks via its own internal
TaskGroup insideHTTP2Actor.run(). The outer wrap added no
supervision — just an extraasyncio.Taskallocation per
connection (observed 2× alive-task count vs connections in
diagnostic dumps). Replaced with a directawait self._dispatch()- plain
except Exception. (PR #32)
- plain
Local benchmark (HttpArena static profile, c=4096, 3 back-to-back wrk runs)
| Configuration | Run 1 r/s | Run 2 r/s | Run 3 r/s | Degradation 1→3 |
|---|---|---|---|---|
| Master before Sprint 30 (cap=0) | 4,630 | 4,362 | 4,048 | 12.6% |
| Sprint 30 default (cap=1024, keep-alive 5 s) | 4,287 | 4,173 | 4,081 | 4.8% |
| Same with c=1024 (under cap) | 4,704 | 5,159 | 5,056 | none — runs 2/3 faster |
The cliff at c=4096 is halved. At c=1024 (the realistic adopter
concurrency) it is eliminated — back-to-back runs 2/3 are
faster than run 1.
Tests
- 9 new unit tests across
test_asyncio_writer.py(5 — write-timeout
edge cases) andtest_max_connections_503.py(4 — 503-response
shape).
Notes for adopters
- Default keepalive 60 s → 5 s matches every other major HTTP
server. If your clients legitimately need longer idle periods,
setBB_KEEP_ALIVE_TIMEOUTexplicitly. - Default max-connections 0 → 1024 caps per-worker concurrency.
For higher load, setworkers=N(multi-worker scales the
ceiling).BB_MAX_CONNECTIONS=0restores unbounded. - Migrating from
0.29.0a1. Thea1alpha shipped a
BB_USE_CUSTOM_PROTOCOLenv var (default off) wiring a custom
asyncio.Protocolsubclass. That env var is removed in0.29.0;
anyone who set it explicitly should unset it. The code is parked
onSprint30-tier1.5-custom-protocolif you need to keep
experimenting.
Out of scope / deferred
- Custom asyncio protocol (
_BlackBullProtocol+ProtocolBuffer,
former Tier 1.5). Parked on theSprint30-tier1.5-custom-protocol
branch. EC2 cross-check (c7i.2xlarge, c=4096, 60 s window)
measured client-side p50 latency 189 → 207 ms (+9 %) and
throughput 5,329 → 4,879 r/s (-8 %) with the toggle on — a
regression, not the local microbenchmark's ~5 % drain-time win.
Removed from the release rather than shipped as opt-in code that
the EC2 evidence says nobody should turn on. - Accept-pausing watermarks (
BB_ACCEPT_PAUSE_HIGH/LOW_WATERMARK):
prototyped on thetier2-accept-pausingbranch but deferred — the
mechanism works (3× client-side latency reduction in measurement)
but trades throughput in a way that surprises adopters who expect
asyncio servers to be throughput-stable. Branch retained for
future revisit if a priority-scheduling primitive becomes available.