Try all addresses from DNS before failing to connect by bjosv · Pull Request #300 · valkey-io/libvalkey

bjosv · 2026-04-13T16:36:53Z

When a hostname resolves to multiple addresses, libvalkey now iterates through all of them before giving up.
Previously only EHOSTUNREACH would try the next address; all other connect errors failed immediately.

The connect_timeout applies per address, not to the overall attempt.
With N addresses that all time out, the total wait is up to N * timeout.

The first commit refactors the connect error handling to prepare for the change:

Removes the confusing wait_for_ready goto label
Fixes TCP_NODELAY not being set on immediate connect success
Reorders errno checks to put EINPROGRESS first
Makes each error branch self-contained

The second commit adds the fallback behavior and a unit test that overrides getaddrinfo and connect to verify that the library falls back to a working address after earlier ones fail with ECONNREFUSED.

Fixes: - Fix TCP_NODELAY not being set on immediate connect success (blocking mode) Improvements: - Close socket on EADDRNOTAVAIL when retry limit is exhausted - Remove confusing wait_for_ready goto label - each error case is now self-contained - Make the else branch explicitly report the error instead of misusing valkeyContextWaitReady as an error reporter - Reorder errno checks to put EINPROGRESS (the common case) first Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

When a hostname resolves to multiple addresses, iterate through all of them before giving up. Previously only EHOSTUNREACH would try the next address, all other connect errors failed immediately. Now any connect failure (ECONNREFUSED, ETIMEDOUT, ENETUNREACH, etc.) tries the next address in the list. Only socket option errors (setsockopt, fcntl) fail immediately since those are system-level and won't be fixed by a different IP. Note: the user-provided connect timeout applies per address, not to the overall attempt. With N addresses that all time out, the total wait is up to N × timeout. Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

bjosv · 2026-04-13T17:00:46Z

When digging in hiredis history the handling of EHOSTUNREACH was added for dual-stack systems where getaddrinfo returns an IPv6 address before IPv4. If the host has no IPv6 connectivity, connect() to the IPv6 address fails with EHOSTUNREACH, c4ed06d fixed that only case.

With this change we support usecases when running in K8s and connecting to a service.

michael-grunder · 2026-04-13T19:03:52Z

This is much nicer. That jump to wait_for_ready inside an else always confused me 😄

Update testcase to catch this case, changed the connect errno to match real failures. Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

zuiderkwast · 2026-04-16T08:22:08Z

+void valkeyClearError(valkeyContext *c) {
+    c->err = 0;
+    memset(c->errstr, '\0', strlen(c->errstr));
+}


We do this in various places. Should we use the function always?

Not that many actually, but it would be good to use this function always, like in valkeyReconnect.

zuiderkwast · 2026-04-16T08:28:17Z

 void valkeySetError(valkeyContext *c, int type, const char *str) {
    size_t len;

    c->err = type;
    if (str != NULL) {
        len = strlen(str);
        len = len < (sizeof(c->errstr) - 1) ? len : (sizeof(c->errstr) - 1);
        memcpy(c->errstr, str, len);
        c->errstr[len] = '\0';
    } else {
        /* Only VALKEY_ERR_IO may lack a description! */
        assert(type == VALKEY_ERR_IO);
        strerror_r(errno, c->errstr, sizeof(c->errstr));
    }
 }


I see that valkeySetErrorFromErrno is very similar to this special case. We could sort out the different cases, but it might not be the scope of this PR.

Ah, yes, good idea. Another PR where we also use valkeyClearError as mentioned in the other comment.

bjosv added 2 commits April 13, 2026 16:20

bjosv requested review from michael-grunder and zuiderkwast April 13, 2026 17:00

michael-grunder reviewed Apr 13, 2026

View reviewed changes

Comment thread src/net.c

michael-grunder reviewed Apr 13, 2026

View reviewed changes

Comment thread src/net.c

fixup: Clear error from any failed addresses before a success

bbf64a9

Update testcase to catch this case, changed the connect errno to match real failures. Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

michael-grunder approved these changes Apr 14, 2026

View reviewed changes

fixup: run clang-format on test code

776bd2e

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

zuiderkwast reviewed Apr 16, 2026

View reviewed changes

zuiderkwast approved these changes Apr 16, 2026

View reviewed changes

bjosv merged commit 61b27c4 into valkey-io:main Apr 16, 2026
48 checks passed

bjosv deleted the connect-fallback branch April 16, 2026 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try all addresses from DNS before failing to connect#300

Try all addresses from DNS before failing to connect#300
bjosv merged 4 commits intovalkey-io:mainfrom
bjosv:connect-fallback

bjosv commented Apr 13, 2026

Uh oh!

bjosv commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

michael-grunder commented Apr 13, 2026

Uh oh!

zuiderkwast Apr 16, 2026

Uh oh!

bjosv Apr 16, 2026

Uh oh!

zuiderkwast Apr 16, 2026

Uh oh!

bjosv Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bjosv commented Apr 13, 2026

Uh oh!

bjosv commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

michael-grunder commented Apr 13, 2026

Uh oh!

zuiderkwast Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

bjosv Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

bjosv Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants