Skip to content

v0.29.0 — Sprint 30 close (Tier 1 only)

Choose a tag to compare

@TOKUJI TOKUJI released this 04 Jun 03:31
· 43 commits to master since this release
4f51f73

Sprint 30 close — event-loop integrity under hostile / burst load
(Tier 1 only).
Supersedes the 0.29.0a1 alpha pre-release: the
custom-protocol path (Tier 1.5, PRs #36 / #37 / #38) shipped in a1
behind BB_USE_CUSTOM_PROTOCOL=False was reverted before the
final
after the EC2 cross-check showed it regressed client-side
latency by ~9 % (p50 189 → 207 ms) and throughput by ~8 % at c=4096
on c7i.2xlarge. The code is parked on the
Sprint30-tier1.5-custom-protocol branch for future revisit; it is
not in this release in any form. See Notes for adopters below
for migration guidance from a1.

Added

  • BB_WRITE_TIMEOUT (default 30 s, 0 disables) — bounds the
    time spent in StreamWriter.drain() waiting for the kernel send
    buffer to flush. Defends against the slow-read shape of
    slowloris: a client that reads the response 1 byte/sec eventually
    fills the kernel send buffer and the server's drain blocks
    indefinitely without this timeout. On timeout the transport is
    force-closed and the failure surfaces as a peer-side
    ConnectionResetError for the sender's existing error path.
    (PR #33)
  • BB_MAX_CONNECTIONS graceful 503 response — when the cap is
    reached, new connections now receive HTTP/1.1 503 Service Unavailable with Retry-After: 1 before close. Previously the
    rejection path silently closed the socket, which load-balancers
    interpret as a server crash. ALPN-h2 connections still close
    without writing (no SETTINGS exchange yet for clean GOAWAY).
    (PR #35)

Changed

  • BB_KEEP_ALIVE_TIMEOUT default lowered from 60 to 5 seconds.
    Aligns with the industry-standard short-idle default (uvicorn,
    granian, Caddy, Apache, Go net/http — all 5 s; gunicorn 2 s).
    60 s was a long-standing outlier that parked ghost / idle
    connection tasks in the loop's readuntil for far longer than
    necessary, inflating suspended-task count and amplifying drain
    time on burst-close. Behaviour change: clients that pause

    5 s between requests on a keep-alive connection will be closed
    and must reopen. Set BB_KEEP_ALIVE_TIMEOUT=60 to restore the
    prior default. (PR #34)

  • BB_MAX_CONNECTIONS default raised from 0 (disabled) to
    1024 per worker.
    Unbounded per-worker concurrency lets a
    single client, burst, or slowloris-class workload park thousands
    of suspended-readuntil tasks on the event loop, amplifying drain
    time on burst-close and inflating worst-case latency. 1024 is
    the typical ceiling for a single asyncio loop; multi-worker
    servers multiply the ceiling (workers × max_connections).
    Behaviour change: deployments accepting >1024 concurrent
    connections per worker now see HTTP/1.1 503 once the cap is
    reached. Set BB_MAX_CONNECTIONS=0 to restore unbounded.
    (PR #35)

Fixed

  • AsyncioWriter.close() no longer awaits wait_closed(). The
    synchronous self._sw.close() already initiates the TCP shutdown
    and schedules the transport's connection_lost callback. Awaiting
    wait_closed() afterwards serialised our connection-actor
    coroutine with full transport-close completion, adding 1-3
    event-loop turns per connection. Under burst-keepalive workloads
    (HttpArena static at c=4096) those extra turns multiplied into
    multi-second drains that monopolised the loop and degraded
    throughput on back-to-back wrk runs. (PR #32)
  • ConnectionActor.run drops redundant asyncio.TaskGroup wrap.
    Both HTTP/1.1 (HTTP1Actor) and HTTP/2 (HTTP2Actor) run their
    protocol-specific logic without spawning sibling tasks at this
    level; HTTP/2 manages per-stream tasks via its own internal
    TaskGroup inside HTTP2Actor.run(). The outer wrap added no
    supervision — just an extra asyncio.Task allocation per
    connection (observed 2× alive-task count vs connections in
    diagnostic dumps). Replaced with a direct await self._dispatch()
    • plain except Exception. (PR #32)

Local benchmark (HttpArena static profile, c=4096, 3 back-to-back wrk runs)

Configuration Run 1 r/s Run 2 r/s Run 3 r/s Degradation 1→3
Master before Sprint 30 (cap=0) 4,630 4,362 4,048 12.6%
Sprint 30 default (cap=1024, keep-alive 5 s) 4,287 4,173 4,081 4.8%
Same with c=1024 (under cap) 4,704 5,159 5,056 none — runs 2/3 faster

The cliff at c=4096 is halved. At c=1024 (the realistic adopter
concurrency) it is eliminated — back-to-back runs 2/3 are
faster than run 1.

Tests

  • 9 new unit tests across test_asyncio_writer.py (5 — write-timeout
    edge cases) and test_max_connections_503.py (4 — 503-response
    shape).

Notes for adopters

  • Default keepalive 60 s → 5 s matches every other major HTTP
    server. If your clients legitimately need longer idle periods,
    set BB_KEEP_ALIVE_TIMEOUT explicitly.
  • Default max-connections 0 → 1024 caps per-worker concurrency.
    For higher load, set workers=N (multi-worker scales the
    ceiling). BB_MAX_CONNECTIONS=0 restores unbounded.
  • Migrating from 0.29.0a1. The a1 alpha shipped a
    BB_USE_CUSTOM_PROTOCOL env var (default off) wiring a custom
    asyncio.Protocol subclass. That env var is removed in 0.29.0;
    anyone who set it explicitly should unset it. The code is parked
    on Sprint30-tier1.5-custom-protocol if you need to keep
    experimenting.

Out of scope / deferred

  • Custom asyncio protocol (_BlackBullProtocol + ProtocolBuffer,
    former Tier 1.5).
    Parked on the Sprint30-tier1.5-custom-protocol
    branch. EC2 cross-check (c7i.2xlarge, c=4096, 60 s window)
    measured client-side p50 latency 189 → 207 ms (+9 %) and
    throughput 5,329 → 4,879 r/s (-8 %) with the toggle on — a
    regression, not the local microbenchmark's ~5 % drain-time win.
    Removed from the release rather than shipped as opt-in code that
    the EC2 evidence says nobody should turn on.
  • Accept-pausing watermarks (BB_ACCEPT_PAUSE_HIGH/LOW_WATERMARK):
    prototyped on the tier2-accept-pausing branch but deferred — the
    mechanism works (3× client-side latency reduction in measurement)
    but trades throughput in a way that surprises adopters who expect
    asyncio servers to be throughput-stable. Branch retained for
    future revisit if a priority-scheduling primitive becomes available.