Skip to content

Coordination: added timeout for session init waiting#653

Merged
alex268 merged 4 commits into
ydb-platform:masterfrom
alex268:feature/coordination-reconnect-timeout
May 27, 2026
Merged

Coordination: added timeout for session init waiting#653
alex268 merged 4 commits into
ydb-platform:masterfrom
alex268:feature/coordination-reconnect-timeout

Conversation

@alex268
Copy link
Copy Markdown
Member

@alex268 alex268 commented May 27, 2026

No description provided.

dmitrybitquery and others added 3 commits May 21, 2026 20:21
When a coordination session loses its gRPC stream and starts reconnecting,
a new Stream is created with disableDeadline(). If the underlying TCP
connection enters a half-open state (SYN established but no data flows),
the gRPC stream never delivers SessionStarted, so startFuture never
completes. The reconnect loop then stalls indefinitely, leaving all
pending CompletableFutures (e.g. acquireEphemeralSemaphore) unresolved
even after the application-level acquire timeout has expired.

Fix: pass connectTimeout into Stream and schedule a one-shot timer that
cancels the gRPC stream and completes startFuture with TIMEOUT if
SessionStarted is not received within that window. The timer fires only
when startFuture is still pending, so it is a no-op on the happy path.

Reported via YDB support: YDBREQUESTS-7830
…store tests

Keep Stream(Rpc rpc) as a delegate to Stream(rpc, Duration.ofSeconds(5)) so
existing unit tests compile unchanged. SessionImpl passes the configured
connectTimeout explicitly; the no-arg overload is only used in tests.

Restore StreamTest.java (was accidentally cleared) and revert
StreamIntegrationTest.java to the original constructor call.
@alex268 alex268 changed the title Feature/coordination reconnect timeout Coordination: added timeout for session init waiting May 27, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.44%. Comparing base (eff1645) to head (7c5ff79).

Files with missing lines Patch % Lines
...c/main/java/tech/ydb/coordination/impl/Stream.java 88.23% 0 Missing and 2 partials ⚠️
...n/java/tech/ydb/coordination/impl/SessionImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #653      +/-   ##
============================================
+ Coverage     71.31%   71.44%   +0.12%     
- Complexity     3365     3373       +8     
============================================
  Files           379      379              
  Lines         15920    15929       +9     
  Branches       1669     1670       +1     
============================================
+ Hits          11353    11380      +27     
+ Misses         3917     3900      -17     
+ Partials        650      649       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java Outdated
Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java Outdated
Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java Outdated
@robot-vibe-db
Copy link
Copy Markdown

robot-vibe-db Bot commented May 27, 2026

AI Review Summary

Verdict: ✅ No critical issues found

Critical issues

No critical issues found.

Other findings

  • Minor | Medium: closeStream() uses stream.close() (half-close) instead of stream.cancel() (immediate abort) and does not complete stopFuture, leaving gRPC resources to linger — Stream.java:66-69
  • Minor | Medium: Small race window between cancel timer and onSessionStarted: if both fire near-simultaneously, cancelStream() can disrupt an already-established session — Stream.java:93
  • Minor | High: Mock scheduler not stubbed for schedule(), returns null, causing silent NPE in startFuture.whenComplete() handler; timer-cleanup path is never exercised in tests — StreamTest.java:28
  • Minor | High: connectTimeoutTest does not verify that startFuture completes with CLIENT_CANCELLED after the cancel timer fires — StreamTest.java:151
  • Nit | High: Typo in constant name STREAM_CANCEL_TIMOUT_MSSTREAM_CANCEL_TIMEOUT_MSStream.java:31

This review was generated automatically. Critical issues require attention; other findings are advisory.
If this comment was useful, please give it a 👍 — it helps us improve the review bot.

@robot-vibe-db
Copy link
Copy Markdown

robot-vibe-db Bot commented May 27, 2026

Full analysis log

Analysis performed by claude, claude-opus-4-6.

Copy link
Copy Markdown

@robot-vibe-db robot-vibe-db Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review Summary

Verdict: ✅ No critical issues found

Critical issues

No critical issues found.

Other findings

  • Minor | Medium: cancelStream() short-circuit || leaves stopFuture uncompleted when the start-timeout fires, creating a brief window where stop() could send on a cancelled stream — Stream.java:73
  • Minor | Medium: sendSessionStart() does not guard against a pre-completed startFuture, so calling it after closeStream() sends on a half-closed gRPC stream — Stream.java:93
  • Nit | High: Typo in test comment STREAM_CANCEL_TIMOUT_MSSTREAM_CANCEL_TIMEOUT_MSStreamTest.java:154

This review was generated automatically. Critical issues require attention; other findings are advisory.
If this comment was useful, please give it a 👍 — it helps us improve the review bot.

Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java Outdated
Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java
Comment thread coordination/src/test/java/tech/ydb/coordination/impl/StreamTest.java Outdated
@robot-vibe-db
Copy link
Copy Markdown

robot-vibe-db Bot commented May 27, 2026

Full analysis log

Analysis performed by claude, claude-opus-4-6.

@alex268 alex268 force-pushed the feature/coordination-reconnect-timeout branch from f644bf9 to 7c5ff79 Compare May 27, 2026 14:19
@alex268 alex268 requested a review from KirillKurdyukov May 27, 2026 14:23
Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java
Comment thread coordination/src/main/java/tech/ydb/coordination/impl/Stream.java
@robot-vibe-db
Copy link
Copy Markdown

robot-vibe-db Bot commented May 27, 2026

AI Review Summary

Verdict: ✅ No critical issues found

Critical issues

No critical issues found.

Other findings

  • Major | Medium: Race condition in cancelStream() — when fired from the start-timeout timer, it can cancel a successfully-started session if the server's SessionStarted response completes startFuture just before the timer fires but after timer.cancel(true) can no longer prevent execution. The wasNotStopped condition is always true for a running session, causing the || to trigger stream.cancel() even though the start was acknowledged. — coordination/src/main/java/tech/ydb/coordination/impl/Stream.java:75
  • Minor | Medium: sendSessionStart() calls stream.sendNext() on a potentially closed gRPC stream without guarding against startFuture already being completed (e.g., after closeStream()) — coordination/src/main/java/tech/ydb/coordination/impl/Stream.java:92

This review was generated automatically. Critical issues require attention; other findings are advisory.
If this comment was useful, please give it a 👍 — it helps us improve the review bot.

@robot-vibe-db
Copy link
Copy Markdown

robot-vibe-db Bot commented May 27, 2026

Full analysis log

Analysis performed by claude, claude-opus-4-6.

@alex268 alex268 merged commit 7ae4382 into ydb-platform:master May 27, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants