fix(ws): serialise session close against in-flight writes (spec-026 US3)#525
Merged
Conversation
NostrRelayClient called the no-arg clientSession.close() in close() and the send() timeout path. Spring 6.2.x's ConcurrentWebSocketSessionDecorator overrides only close(CloseStatus) (guarded by closeLock); the no-arg close() falls through to delegate.close() with no coordination, racing an in-flight sendMessage flush on the Tomcat delegate (single-in-flight-write → IllegalStateException: Concurrent write operations are not permitted). This was the residual concurrent-write leakage observed during the spec-026 soak, a layer below wallet-lib's adapter sendLock. Routing through close(CloseStatus) alone is insufficient: its closeLock is a separate lock from the flushLock guarding delegate.sendMessage. Introduce a ReentrantReadWriteLock session gate — every write (send/subscribe via sendFrameGated) takes the read side (sends stay concurrent; the decorator still serialises the delegate writes among them), every close (closeGated/ closeQuietly, always close(CloseStatus)) takes the write side and waits for in-flight sends to drain before sending the CLOSE frame. No lock nesting, so no added deadlock risk. Tests: new NostrRelayClientCloseWriteRaceTest (routing through CloseStatus + never no-arg close; timeout-path routing; latch-based serialisation proof). Updated NostrRelayClientTimeoutTest and SpringWebSocketClientTest, which asserted the old no-arg close(), to assert close(any(CloseStatus)) + never().close(). 24/24 nostr-java-client tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a WebSocket close-vs-write race inside nostr.client.springwebsocket.NostrRelayClient by ensuring session closes are serialized against in-flight writes, avoiding Tomcat’s “Concurrent write operations are not permitted” IllegalStateException.
Changes:
- Added a read/write “session gate” to block
close(CloseStatus)until all in-flightsendMessagecalls drain, and routed all closes throughclose(CloseStatus)(never the no-argclose()). - Refactored
send()/subscribe()to use gated send + best-effort gated close helpers for timeout/overflow paths. - Updated and added tests to assert close routing and demonstrate non-overlap between send and close, plus a changelog entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
nostr-java-client/src/main/java/nostr/client/springwebsocket/NostrRelayClient.java |
Introduces RW-lock gating for close vs writes; reroutes send/subscribe/timeout/overflow closure through gated helpers. |
nostr-java-client/src/test/java/nostr/client/springwebsocket/SpringWebSocketClientTest.java |
Updates timeout expectations to verify close(CloseStatus) is used and close() is never called. |
nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientTimeoutTest.java |
Aligns timeout close assertions with the new close(CloseStatus) routing and ensures session is open for the close path. |
nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientCloseWriteRaceTest.java |
Adds regression coverage for close routing and close-vs-write serialization (delegate never sees overlap). |
CHANGELOG.md |
Adds an [Unreleased] “Fixed” entry describing the race and the locking-based resolution. |
Comments suppressed due to low confidence (2)
nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientCloseWriteRaceTest.java:173
- These comments attribute close() blocking to sendLock, but the blocking mechanism in the fix is the sessionGate write lock waiting for an in-flight sessionGate read lock held by sendFrameGated(). Updating the comments will keep the test explanation aligned with what it actually proves.
// With the fix, close() blocks on sendLock (held by the parked subscribe), so
// the delegate close has NOT run yet — exactly one op (the write) in flight.
awaitThreadParked(closer, 2_000);
assertEquals(1, delegateOps.get(),
"close() must not reach the delegate while a send is in flight on it");
nostr-java-client/src/test/java/nostr/client/springwebsocket/NostrRelayClientCloseWriteRaceTest.java:208
- The helper Javadoc mentions the thread is blocked acquiring sendLock, but in the current implementation close() waits on the sessionGate write lock (not sendLock). Update the wording so future maintainers don’t chase the wrong lock when debugging this test.
/**
* Poll until {@code thread} is parked (BLOCKED / WAITING / TIMED_WAITING) — i.e.
* blocked acquiring sendLock — or the timeout elapses.
*/
Cuts 2.0.4 with the spec-026 US3 close-vs-write fix. Moves [Unreleased] changelog content into the 2.0.4 section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Copilot review on PR #525: - send(): clear pendingRequest on ANY sendFrameGated failure (not only SessionLimitExceededException). A plain IOException previously left the client stuck "in flight", breaking the next send()/@NostrRetryable retry. - closeQuietly(): log the swallowed throwable (+ relayUri) instead of just e.getMessage(), preserving the stack trace. - tests: correct stale sendLock references to the sessionGate RW-lock; narrow the timeout assertThrows from Exception to RelayTimeoutException. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts 2.0.5 with the PR #525 review fixes: send() clears pendingRequest on any send failure (not just overflow) so a failed write can't wedge the in-flight slot, and closeQuietly() logs the throwable + relay URI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the residual
IllegalStateException: Concurrent write operations are not permittedobserved during the spec-026 staging soak — a close-vs-write race insidenostr.client.springwebsocket.NostrRelayClient, below wallet-lib's adaptersendLock.Root cause.
NostrRelayClientwraps its session in Spring'sConcurrentWebSocketSessionDecoratorbut called the no-argclientSession.close()(inclose()and thesend()timeout path). Spring 6.2.x's decorator overrides onlyclose(CloseStatus)(guarded bycloseLock); the no-argclose()falls through toWebSocketSessionDecorator.close()→delegate.close()with no coordination, sending a CLOSE frame straight to the Tomcat delegate while asendMessageflush is in flight. Tomcat permits a single write in flight, so the CLOSE frame (WsSession.sendCloseMessage) racing a data frame (WsRemoteEndpointImplBase.sendPartialString) throws.Routing through
close(CloseStatus)alone is insufficient: itscloseLockis a separate lock from theflushLockthat guardsdelegate.sendMessage, so it only narrows the window rather than closing it.Fix. Introduce a
ReentrantReadWriteLocksession gate:send/subscribe, viasendFrameGated) takes the read side — sends stay concurrent; the decorator still serialises the delegate writes among them.closeGated/closeQuietly, alwaysclose(CloseStatus), never the no-argclose()) takes the write side, so it waits for all in-flight sends to drain before sending the CLOSE frame.This preserves the existing decorator-concurrency design (and its tests) while fully closing the race.
Changes
NostrRelayClient:sessionGateRW-lock;sendFrameGated/closeGated/closeQuietly/clearPendingRequesthelpers;send,subscribe, thesend()timeout path, andclose()rerouted through them.CHANGELOG.md:[Unreleased] → Fixedentry.Test plan
NostrRelayClientCloseWriteRaceTest— (1)close()routes throughclose(CloseStatus)and never the no-argclose(); (2) thesend()timeout path routes the same way; (3) latch-based serialisation proof: a close blocks on the gate while a send is in flight, so the delegate never sees a write and a close at once.NostrRelayClientTimeoutTest+SpringWebSocketClientTest(which asserted the old buggy no-argclose()) to assertclose(any(CloseStatus))+never().close().mvn -pl nostr-java-client -am test→ 24/24 pass. Concurrency + overflow tests preserved (overflow now ~13s because close correctly waits for the deliberately-stuck in-flight send).🤖 Generated with Claude Code