Try all addresses from DNS before failing to connect#300
Try all addresses from DNS before failing to connect#300bjosv merged 4 commits intovalkey-io:mainfrom
Conversation
Fixes: - Fix TCP_NODELAY not being set on immediate connect success (blocking mode) Improvements: - Close socket on EADDRNOTAVAIL when retry limit is exhausted - Remove confusing wait_for_ready goto label - each error case is now self-contained - Make the else branch explicitly report the error instead of misusing valkeyContextWaitReady as an error reporter - Reorder errno checks to put EINPROGRESS (the common case) first Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
When a hostname resolves to multiple addresses, iterate through all of them before giving up. Previously only EHOSTUNREACH would try the next address, all other connect errors failed immediately. Now any connect failure (ECONNREFUSED, ETIMEDOUT, ENETUNREACH, etc.) tries the next address in the list. Only socket option errors (setsockopt, fcntl) fail immediately since those are system-level and won't be fixed by a different IP. Note: the user-provided connect timeout applies per address, not to the overall attempt. With N addresses that all time out, the total wait is up to N × timeout. Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
|
When digging in hiredis history the handling of With this change we support usecases when running in K8s and connecting to a service. |
|
This is much nicer. That jump to |
Update testcase to catch this case, changed the connect errno to match real failures. Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
| void valkeyClearError(valkeyContext *c) { | ||
| c->err = 0; | ||
| memset(c->errstr, '\0', strlen(c->errstr)); | ||
| } |
There was a problem hiding this comment.
We do this in various places. Should we use the function always?
There was a problem hiding this comment.
Not that many actually, but it would be good to use this function always, like in valkeyReconnect.
| void valkeySetError(valkeyContext *c, int type, const char *str) { | ||
| size_t len; | ||
|
|
||
| c->err = type; | ||
| if (str != NULL) { | ||
| len = strlen(str); | ||
| len = len < (sizeof(c->errstr) - 1) ? len : (sizeof(c->errstr) - 1); | ||
| memcpy(c->errstr, str, len); | ||
| c->errstr[len] = '\0'; | ||
| } else { | ||
| /* Only VALKEY_ERR_IO may lack a description! */ | ||
| assert(type == VALKEY_ERR_IO); | ||
| strerror_r(errno, c->errstr, sizeof(c->errstr)); | ||
| } | ||
| } |
There was a problem hiding this comment.
I see that valkeySetErrorFromErrno is very similar to this special case. We could sort out the different cases, but it might not be the scope of this PR.
There was a problem hiding this comment.
Ah, yes, good idea. Another PR where we also use valkeyClearError as mentioned in the other comment.
When a hostname resolves to multiple addresses, libvalkey now iterates through all of them before giving up.
Previously only
EHOSTUNREACHwould try the next address; all other connect errors failed immediately.The connect_timeout applies per address, not to the overall attempt.
With N addresses that all time out, the total wait is up to N * timeout.
The first commit refactors the connect error handling to prepare for the change:
wait_for_readygoto labelTCP_NODELAYnot being set on immediate connect successerrnochecks to putEINPROGRESSfirstThe second commit adds the fallback behavior and a unit test that overrides
getaddrinfoandconnectto verify that the library falls back to a working address after earlier ones fail withECONNREFUSED.