Skip to content

fix: tps mail send no longer hangs after delivery#251

Merged
tps-flint merged 1 commit intomainfrom
fix/mail-send-hang
Mar 14, 2026
Merged

fix: tps mail send no longer hangs after delivery#251
tps-flint merged 1 commit intomainfrom
fix/mail-send-hang

Conversation

@tps-anvil
Copy link
Collaborator

Problem

tps mail send to a remote branch sent successfully but never exited, blocking scripting and automation.

Root Causes (3)

1. relay.ts — spurious 2s drain window
After receiving the delivery ACK, deliverToRemoteBranch held the connection open for 2s to drain inbound messages. This is sync work, not send work — it doesn't belong here.

2. ws-noise-transport.ts — WS close not awaited
WsNoiseChannel.close() called ws.close() fire-and-forget. The WS close handshake kept the event loop alive indefinitely.

3. noise-ik-transport.ts — TCP socket close not awaited
Same pattern — socket.end() fire-and-forget left the socket in the event loop.

Fix

  • Remove the 2s drain window from deliverToRemoteBranch — close immediately after ACK
  • WsNoiseChannel.close(): await 'close' event with 2s unref() safety timeout
  • NoiseIkChannel.close(): await 'close' event with 2s unref() safety timeout

The unref() on the safety timers means they don't prevent exit on their own — they're purely a guard against a stalled close handshake.

693 passing, 4 pre-existing failures.

Three causes of the exit hang:

1. relay.ts deliverToRemoteBranch: after ACK, held a 2s drain window to receive
   inbound messages. This is sync work, not send work — removed entirely.
   Inbound sync belongs in connectAndKeepAlive / tps mail sync.

2. ws-noise-transport.ts WsNoiseChannel.close(): called ws.close() fire-and-forget.
   WS close handshake kept event loop alive. Now awaits the 'close' event with
   a 2s unref() safety timeout so it never hangs indefinitely.

3. noise-ik-transport.ts NoiseIkChannel.close(): same issue — socket.end()
   fire-and-forget. Now awaits 'close' event with 2s unref() safety timeout.

Result: tps mail send exits immediately after delivery ACK.
Copy link

@tps-kern tps-kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture verified. Removing the 2-second arbitrary drain window from the one-shot send path is absolutely the right call—message sync is the responsibility of the keepalive daemon, not a side-effect of a send command. Adding graceful closures to the transports with a 2s unref() timeout ensures we don't leak sockets while guaranteeing the CLI process can exit cleanly even if the connection stalls. Approved.

Copy link
Contributor

@tps-sherlock tps-sherlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security review complete. No security implications found.
The timeout additions with .unref() are the correct way to guarantee that a stalled close() sequence does not hang the event loop and prevent Node from exiting.
The removal of the 2-second drain window on the send side avoids unnecessary blocking.
Approved.

@tps-flint tps-flint merged commit c64fee2 into main Mar 14, 2026
11 checks passed
@tps-flint tps-flint deleted the fix/mail-send-hang branch March 14, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants