Skip to content

fix(vtz): node:http hang — graceful shutdown + sync .then() bug (resolves #2718, #2720)#2749

Merged
viniciusdacal merged 1 commit into
mainfrom
fix/vtz-node-http-hang
Apr 17, 2026
Merged

fix(vtz): node:http hang — graceful shutdown + sync .then() bug (resolves #2718, #2720)#2749
viniciusdacal merged 1 commit into
mainfrom
fix/vtz-node-http-hang

Conversation

@viniciusdacal
Copy link
Copy Markdown
Contributor

Root cause

The synthetic node:http module's createServer().listen() treated globalThis.__vtz_http.serve() as asynchronous — but serve() is a synchronous op that returns the server object directly. The resulting .then(...) on a non-thenable threw a TypeError that was swallowed by the idiomatic new Promise((resolve) => server.listen(0, resolve)), so the listen callback never fired and tests hung at the 120 s watchdog. The CJS require('http') shim had the same bug.

Fixing the shim exposed three more defects in the underlying op layer:

  1. op_http_serve_close aborted the tokio task immediately, cancelling in-flight axum handler futures mid-response so fetch() never got its reply.
  2. op_http_serve_respond keyed the pending-responses map per server — replying after close (which removed the ServerInstance from state) silently dropped the response.
  3. op_http_serve_accept / op_http_serve_respond threw "Unknown server id" when the accept loop re-polled after close, poisoning the event loop and corrupting unrelated tests.

What changed

Rust (native/vtz/src/runtime/ops/http_serve.rs)

  • Replaced the abort-handle teardown with axum's with_graceful_shutdown(...) signal so existing connections drain cleanly.
  • Moved pending_responses onto HttpServeState and keyed it globally by request_id so in-flight replies still work across close.
  • Made op_http_serve_accept and op_http_serve_respond tolerate a missing server id (null / no-op) instead of throwing.
  • Added two unit tests covering graceful shutdown of an in-flight response and the no-error contract after close.

JS (native/vtz/src/runtime/module_loader.rs, both NODE_HTTP_MODULE and the CJS http shim)

  • Treat __vtz_http.serve() as synchronous.
  • listen(cb) fires the callback via queueMicrotask (proper Node-async semantics).
  • close(cb) defers the callback until all in-flight requests finish; connections received after close() return 503 without entering the user handler.
  • Added the stock no-op event-emitter surface (on, once, off, emit, unref, ref) that callers expect.
  • address() returns null before listen() completes, matching Node.

Tests

  • Restored previously-quarantined packages/ui-server/src/__tests__/node-handler.test.ts and packages/docs/src/__tests__/docs-cli-actions.test.ts from .local.ts — both now pass under vtz test (26 + 11 tests, all green).
  • New regression suite at packages/vertz/__tests__/node-http.test.ts with 5 scenarios covering listen(0) callback firing, buffered serve, empty-pending close, deferred close with an in-flight request, and bind failure reporting.
  • Two new Rust unit tests in http_serve.rs (test_http_serve_graceful_shutdown_preserves_in_flight_response, test_http_serve_close_does_not_throw_unknown_server_id).
  • Removed the test:integration npm scripts that existed purely to run these files under bun; image-processor.local.ts stays (NAPI, out of scope).

Quality gates

  • cargo test -p vtz --lib — 3452 passing, 0 failing
  • cargo clippy -p vtz --all-targets -- -D warnings — clean
  • cargo fmt --all -- --check — clean
  • vtz test in packages/ui-server (node-handler, 26 pass)
  • vtz test in packages/docs (263 pass)
  • vtz test in packages/vertz (21 pass incl. 5 new)
  • tsgo --noEmit in ui-server, docs, vertz — clean

Note: the pre-existing test_inspect_brk_unblocks_after_debugger_connects flake under concurrent cargo test --all is unrelated; it passes in isolation.

Public API changes

None — internal runtime behavior only. Fixes bugs, no new surface.

Test plan

  • Regression suite covers the hang scenario end-to-end via public node:http API
  • Rust unit tests cover the graceful-shutdown + post-close response paths
  • Restored quarantined .test.ts files exercise node:http in anger (SSR tests with dozens of buffered + streaming requests per test run)

#2720)

The synthetic `node:http` module's `createServer().listen()` treated
`globalThis.__vtz_http.serve()` as async, but `serve()` is a synchronous
op that returns the server object directly. The resulting `.then(...)`
on a non-thenable threw a TypeError that was swallowed by the
`new Promise((resolve) => server.listen(0, resolve))` idiom, so the
listen callback never fired and tests hung at the 120s watchdog. The
same bug existed in the CJS `require('http')` shim.

Fixing the shim surfaced three secondary defects in the op layer:

- `close()` aborted the axum task immediately, cancelling in-flight
  response futures mid-reply. Replaced the abort-handle teardown with
  axum's `with_graceful_shutdown(...)` signal so existing connections
  drain before the task exits.
- `op_http_serve_respond` keyed the pending oneshot map per server, so
  replying after close (which removed the ServerInstance) silently
  dropped the response. Moved pending-responses onto HttpServeState
  and keyed it globally by request_id.
- `op_http_serve_accept` / `op_http_serve_respond` returned "Unknown
  server id" errors when the accept loop re-polled after close(),
  poisoning the event loop. Both ops now treat a missing server as a
  soft null / no-op.

The JS createServer() shim also gained proper Node-compatible
semantics: listen(cb) fires the callback via queueMicrotask, close(cb)
defers until all in-flight requests finish, and new connections
received after close() get a 503 without entering the user handler.

Restored `node-handler.local.ts` and `docs-cli-actions.local.ts` to
`.test.ts` — both now pass under `vtz test`. Removed the bun-fallback
`test:integration` npm scripts that existed to keep these files
running. `image-processor.local.ts` stays quarantined (NAPI, separate
issue).

Regression coverage:
- packages/vertz/__tests__/node-http.test.ts — 5 end-to-end scenarios
  covering listen/fetch/close semantics and graceful shutdown.
- native/vtz/src/runtime/ops/http_serve.rs — two new Rust unit tests
  covering graceful shutdown of in-flight responses and the
  no-unknown-server-id contract.

Closes #2718.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@viniciusdacal viniciusdacal merged commit d5d0a76 into main Apr 17, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant