Release v0.29.0 — Sprint 30 close (Tier 1 only) · TOKUJI/BlackBull

Sprint 30 close — event-loop integrity under hostile / burst load
(Tier 1 only). Supersedes the 0.29.0a1 alpha pre-release: the
custom-protocol path (Tier 1.5, PRs #36 / #37 / #38) shipped in a1
behind BB_USE_CUSTOM_PROTOCOL=False was reverted before the
final after the EC2 cross-check showed it regressed client-side
latency by ~9 % (p50 189 → 207 ms) and throughput by ~8 % at c=4096
on c7i.2xlarge. The code is parked on the
Sprint30-tier1.5-custom-protocol branch for future revisit; it is
not in this release in any form. See Notes for adopters below
for migration guidance from a1.

Added

BB_WRITE_TIMEOUT (default 30 s, 0 disables) — bounds the
time spent in StreamWriter.drain() waiting for the kernel send
buffer to flush. Defends against the slow-read shape of
slowloris: a client that reads the response 1 byte/sec eventually
fills the kernel send buffer and the server's drain blocks
indefinitely without this timeout. On timeout the transport is
force-closed and the failure surfaces as a peer-side
ConnectionResetError for the sender's existing error path.
(PR #33)
BB_MAX_CONNECTIONS graceful 503 response — when the cap is
reached, new connections now receive HTTP/1.1 503 Service Unavailable with Retry-After: 1 before close. Previously the
rejection path silently closed the socket, which load-balancers
interpret as a server crash. ALPN-h2 connections still close
without writing (no SETTINGS exchange yet for clean GOAWAY).
(PR #35)

Changed

BB_KEEP_ALIVE_TIMEOUT default lowered from 60 to 5 seconds.
Aligns with the industry-standard short-idle default (uvicorn,
granian, Caddy, Apache, Go net/http — all 5 s; gunicorn 2 s).
60 s was a long-standing outlier that parked ghost / idle
connection tasks in the loop's readuntil for far longer than
necessary, inflating suspended-task count and amplifying drain
time on burst-close. Behaviour change: clients that pause

5 s between requests on a keep-alive connection will be closed
and must reopen. Set BB_KEEP_ALIVE_TIMEOUT=60 to restore the
prior default. (PR #34)
BB_MAX_CONNECTIONS default raised from 0 (disabled) to
1024 per worker. Unbounded per-worker concurrency lets a
single client, burst, or slowloris-class workload park thousands
of suspended-readuntil tasks on the event loop, amplifying drain
time on burst-close and inflating worst-case latency. 1024 is
the typical ceiling for a single asyncio loop; multi-worker
servers multiply the ceiling (workers × max_connections).
Behaviour change: deployments accepting >1024 concurrent
connections per worker now see HTTP/1.1 503 once the cap is
reached. Set BB_MAX_CONNECTIONS=0 to restore unbounded.
(PR #35)

Fixed

AsyncioWriter.close() no longer awaits wait_closed(). The
synchronous self._sw.close() already initiates the TCP shutdown
and schedules the transport's connection_lost callback. Awaiting
wait_closed() afterwards serialised our connection-actor
coroutine with full transport-close completion, adding 1-3
event-loop turns per connection. Under burst-keepalive workloads
(HttpArena static at c=4096) those extra turns multiplied into
multi-second drains that monopolised the loop and degraded
throughput on back-to-back wrk runs. (PR #32)
ConnectionActor.run drops redundant asyncio.TaskGroup wrap.
Both HTTP/1.1 (HTTP1Actor) and HTTP/2 (HTTP2Actor) run their
protocol-specific logic without spawning sibling tasks at this
level; HTTP/2 manages per-stream tasks via its own internal
TaskGroup inside HTTP2Actor.run(). The outer wrap added no
supervision — just an extra asyncio.Task allocation per
connection (observed 2× alive-task count vs connections in
diagnostic dumps). Replaced with a direct await self._dispatch()
- plain except Exception. (PR #32)

Local benchmark (HttpArena static profile, c=4096, 3 back-to-back wrk runs)

Configuration	Run 1 r/s	Run 2 r/s	Run 3 r/s	Degradation 1→3
Master before Sprint 30 (cap=0)	4,630	4,362	4,048	12.6%
Sprint 30 default (cap=1024, keep-alive 5 s)	4,287	4,173	4,081	4.8%
Same with c=1024 (under cap)	4,704	5,159	5,056	none — runs 2/3 faster

The cliff at c=4096 is halved. At c=1024 (the realistic adopter
concurrency) it is eliminated — back-to-back runs 2/3 are
faster than run 1.

Tests

9 new unit tests across test_asyncio_writer.py (5 — write-timeout
edge cases) and test_max_connections_503.py (4 — 503-response
shape).

Notes for adopters

Default keepalive 60 s → 5 s matches every other major HTTP
server. If your clients legitimately need longer idle periods,
set BB_KEEP_ALIVE_TIMEOUT explicitly.
Default max-connections 0 → 1024 caps per-worker concurrency.
For higher load, set workers=N (multi-worker scales the
ceiling). BB_MAX_CONNECTIONS=0 restores unbounded.
Migrating from 0.29.0a1. The a1 alpha shipped a
BB_USE_CUSTOM_PROTOCOL env var (default off) wiring a custom
asyncio.Protocol subclass. That env var is removed in 0.29.0;
anyone who set it explicitly should unset it. The code is parked
on Sprint30-tier1.5-custom-protocol if you need to keep
experimenting.

Out of scope / deferred

Custom asyncio protocol (_BlackBullProtocol + ProtocolBuffer,
former Tier 1.5). Parked on the Sprint30-tier1.5-custom-protocol
branch. EC2 cross-check (c7i.2xlarge, c=4096, 60 s window)
measured client-side p50 latency 189 → 207 ms (+9 %) and
throughput 5,329 → 4,879 r/s (-8 %) with the toggle on — a
regression, not the local microbenchmark's ~5 % drain-time win.
Removed from the release rather than shipped as opt-in code that
the EC2 evidence says nobody should turn on.
Accept-pausing watermarks (BB_ACCEPT_PAUSE_HIGH/LOW_WATERMARK):
prototyped on the tier2-accept-pausing branch but deferred — the
mechanism works (3× client-side latency reduction in measurement)
but trades throughput in a way that surprises adopters who expect
asyncio servers to be throughput-stable. Branch retained for
future revisit if a priority-scheduling primitive becomes available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.29.0 — Sprint 30 close (Tier 1 only)

Choose a tag to compare

Sorry, something went wrong.