Coordination: added timeout for session init waiting#653
Conversation
When a coordination session loses its gRPC stream and starts reconnecting, a new Stream is created with disableDeadline(). If the underlying TCP connection enters a half-open state (SYN established but no data flows), the gRPC stream never delivers SessionStarted, so startFuture never completes. The reconnect loop then stalls indefinitely, leaving all pending CompletableFutures (e.g. acquireEphemeralSemaphore) unresolved even after the application-level acquire timeout has expired. Fix: pass connectTimeout into Stream and schedule a one-shot timer that cancels the gRPC stream and completes startFuture with TIMEOUT if SessionStarted is not received within that window. The timer fires only when startFuture is still pending, so it is a no-op on the happy path. Reported via YDB support: YDBREQUESTS-7830
…store tests Keep Stream(Rpc rpc) as a delegate to Stream(rpc, Duration.ofSeconds(5)) so existing unit tests compile unchanged. SessionImpl passes the configured connectTimeout explicitly; the no-arg overload is only used in tests. Restore StreamTest.java (was accidentally cleared) and revert StreamIntegrationTest.java to the original constructor call.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #653 +/- ##
============================================
+ Coverage 71.31% 71.44% +0.12%
- Complexity 3365 3373 +8
============================================
Files 379 379
Lines 15920 15929 +9
Branches 1669 1670 +1
============================================
+ Hits 11353 11380 +27
+ Misses 3917 3900 -17
+ Partials 650 649 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
AI Review SummaryVerdict: ✅ No critical issues found Critical issuesNo critical issues found. Other findings
This review was generated automatically. Critical issues require attention; other findings are advisory. |
|
Analysis performed by claude, claude-opus-4-6. |
There was a problem hiding this comment.
AI Review Summary
Verdict: ✅ No critical issues found
Critical issues
No critical issues found.
Other findings
- Minor | Medium:
cancelStream()short-circuit||leavesstopFutureuncompleted when the start-timeout fires, creating a brief window wherestop()could send on a cancelled stream —Stream.java:73 - Minor | Medium:
sendSessionStart()does not guard against a pre-completedstartFuture, so calling it aftercloseStream()sends on a half-closed gRPC stream —Stream.java:93 - Nit | High: Typo in test comment
STREAM_CANCEL_TIMOUT_MS→STREAM_CANCEL_TIMEOUT_MS—StreamTest.java:154
This review was generated automatically. Critical issues require attention; other findings are advisory.
If this comment was useful, please give it a 👍 — it helps us improve the review bot.
|
Analysis performed by claude, claude-opus-4-6. |
f644bf9 to
7c5ff79
Compare
AI Review SummaryVerdict: ✅ No critical issues found Critical issuesNo critical issues found. Other findings
This review was generated automatically. Critical issues require attention; other findings are advisory. |
|
Analysis performed by claude, claude-opus-4-6. |
No description provided.