Skip to content

Try all addresses from DNS before failing to connect#300

Merged
bjosv merged 4 commits intovalkey-io:mainfrom
bjosv:connect-fallback
Apr 16, 2026
Merged

Try all addresses from DNS before failing to connect#300
bjosv merged 4 commits intovalkey-io:mainfrom
bjosv:connect-fallback

Conversation

@bjosv
Copy link
Copy Markdown
Collaborator

@bjosv bjosv commented Apr 13, 2026

When a hostname resolves to multiple addresses, libvalkey now iterates through all of them before giving up.
Previously only EHOSTUNREACH would try the next address; all other connect errors failed immediately.

The connect_timeout applies per address, not to the overall attempt.
With N addresses that all time out, the total wait is up to N * timeout.

The first commit refactors the connect error handling to prepare for the change:

  • Removes the confusing wait_for_ready goto label
  • Fixes TCP_NODELAY not being set on immediate connect success
  • Reorders errno checks to put EINPROGRESS first
  • Makes each error branch self-contained

The second commit adds the fallback behavior and a unit test that overrides getaddrinfo and connect to verify that the library falls back to a working address after earlier ones fail with ECONNREFUSED.

bjosv added 2 commits April 13, 2026 16:20
Fixes:
- Fix TCP_NODELAY not being set on immediate connect success (blocking mode)

Improvements:
- Close socket on EADDRNOTAVAIL when retry limit is exhausted
- Remove confusing wait_for_ready goto label - each error case is now self-contained
- Make the else branch explicitly report the error instead of misusing valkeyContextWaitReady as an error reporter
- Reorder errno checks to put EINPROGRESS (the common case) first

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
When a hostname resolves to multiple addresses, iterate through all of them
before giving up. Previously only EHOSTUNREACH would try the next address, all
other connect errors failed immediately.

Now any connect failure (ECONNREFUSED, ETIMEDOUT, ENETUNREACH, etc.) tries the
next address in the list. Only socket option errors (setsockopt, fcntl) fail
immediately since those are system-level and won't be fixed by a different IP.

Note: the user-provided connect timeout applies per address, not to the overall
attempt. With N addresses that all time out, the total wait is up to N ×
timeout.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
@bjosv
Copy link
Copy Markdown
Collaborator Author

bjosv commented Apr 13, 2026

When digging in hiredis history the handling of EHOSTUNREACH was added for dual-stack systems where getaddrinfo returns an IPv6 address before IPv4. If the host has no IPv6 connectivity, connect() to the IPv6 address fails with EHOSTUNREACH, c4ed06d fixed that only case.

With this change we support usecases when running in K8s and connecting to a service.

Comment thread src/net.c
Comment thread src/net.c
@michael-grunder
Copy link
Copy Markdown
Collaborator

This is much nicer. That jump to wait_for_ready inside an else always confused me 😄

Update testcase to catch this case, changed the connect errno
to match real failures.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Comment thread src/valkey.c
Comment on lines +712 to +715
void valkeyClearError(valkeyContext *c) {
c->err = 0;
memset(c->errstr, '\0', strlen(c->errstr));
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this in various places. Should we use the function always?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that many actually, but it would be good to use this function always, like in valkeyReconnect.

Comment thread src/valkey.c
Comment on lines 696 to 710
void valkeySetError(valkeyContext *c, int type, const char *str) {
size_t len;

c->err = type;
if (str != NULL) {
len = strlen(str);
len = len < (sizeof(c->errstr) - 1) ? len : (sizeof(c->errstr) - 1);
memcpy(c->errstr, str, len);
c->errstr[len] = '\0';
} else {
/* Only VALKEY_ERR_IO may lack a description! */
assert(type == VALKEY_ERR_IO);
strerror_r(errno, c->errstr, sizeof(c->errstr));
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that valkeySetErrorFromErrno is very similar to this special case. We could sort out the different cases, but it might not be the scope of this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, good idea. Another PR where we also use valkeyClearError as mentioned in the other comment.

@bjosv bjosv merged commit 61b27c4 into valkey-io:main Apr 16, 2026
48 checks passed
@bjosv bjosv deleted the connect-fallback branch April 16, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants