Retry once on stale-connection ConnectionError in Pool#with#2
Merged
Conversation
Pool checkouts can hand out connections whose server-side end has
been closed while idle (CH idle_connection_timeout / tcp_keep_alive_timeout,
LB / firewall idle drops, server restart). The first read on such a
socket sees recv() == 0, which clickhouse-cpp surfaces as
std::system_error(errno=0, "closed")
and we wrap as ClickhouseNative::ConnectionError("closed: Success").
The pool already discards the dead client (`discard_current_connection`
in the rescue) so the next checkout is fresh — but we used to re-raise
the error to the caller, who'd see a noisy "closed: Success" failure
on the very first operation after their pooled client went stale.
Now Pool#with discards-and-retries once on ConnectionError. If the
retry also fails (e.g. server actually down), we raise. Other error
classes (ServerError, ProtocolError, EncoderError, etc.) are still
not retried — those are real and re-running could mask logic bugs
or, for ServerError on a half-completed write, double-execute.
Three new specs cover the path: success on retry, raise after the
second failure, and no retry on non-ConnectionError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
We hit this in production:
The error originates in clickhouse-cpp's
SocketInput::DoRead:recv() == 0means the peer cleanly closed the TCP connection (FIN). On a clean closeerrnoisn't set, sostrerror(0) == "Success"— hence the cryptic"closed: Success"message.Common causes for a pooled connection going stale:
idle_connection_timeout/tcp_keep_alive_timeout(defaults: 3600s / 290s).In the prod log, the failed INSERT took only
8.388ms— far faster than any real round-trip; recv returned 0 immediately. The connection was already half-closed by the server before any meaningful work happened, so the INSERT didn't execute server-side and a retry on a fresh socket is safe.Implementation
Pool#withalready discards the dead client on error (discard_current_connection) so the next checkout would be fresh. We just need to actually do that next checkout instead of bubbling the error to callers. New behavior:ConnectionError→ discard, retry once. If the retry also fails (e.g. server genuinely down), raise.ServerError,ProtocolError,EncoderError,UnsupportedTypeError, …) → discard, raise immediately. No retry.The retry is intentionally narrow —
ConnectionErroris what fires for stale-pool sockets where the server closed before we sent anything meaningful. Other error classes can fire mid-operation and re-running them could mask bugs or double-execute writes.Tests
Three new specs in the existing
Pooldescribe block:ConnectionError,ConnectionError,ServerError.Full suite: 63 examples, 0 failures. Rubocop clean.
Test plan
bundle exec rspecbundle exec rubocop🤖 Generated with Claude Code