-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support host names in YugaByte DB #128
Comments
Possible solutions:
Preferred optionOption 1 is not preferable because the failure in resolution of single peer leads to the failure of the whole Raft config update. This means that even though we could have moved forward, we would not do so till the peer is removed from the Raft quorum. Option 2 detailsOption #2 is very preferable as it works well with kubernetes, but we would need a way to safely re-resolve the dns or remove the node from the list after a while even if that node ends up alive. Basically, if the node that was removed is added back in a few seconds, we should re-resolve it at a later point. Assuming we leave the peer connection object as a shell without the ip address but with just the dns name: if the node that failed dns resolution comes back, we may never refresh the connection to this host. The master also will not remove the shell host from the tablet raft groups, because the shell host will already appear as a valid part of various quorums. @spolitov's proposal: We could do an async DNS resolution, so it would not block current thread and other peers could be resolved. Currently the DNS resolution happens while holding the lock, and that is not a good practice. Even in a good setup, DNS resolution could add a significant latency. This is ok when we are refreshing peers when becoming a new master, but not ok at steady state if we want to re-resolve on the fly. |
cc @robertpang who is looking at something similar. |
This is still happening even after @spolitov fixed the DNS resolution and turned it async. The problem now is that, when DNS resolution fails, So in short, no DNS resolution -> no I've also figured out a way to repro this locally, with Setting up
Then with a bit of tweak to yb-ctl to use those hostnames instead of ips:
it can repro with
|
Fixed |
Commit 9ba8436 |
Add Oracle regexp_like(), regexp_count(), regexp_instr() and regexp_substr() functions.
PG-215: Doc Fixed links to usage examples,
Scenario
Seeing this issue with shrinking the cluster, when running using yb-docker-ctl and Kubernetes.
Steps to repro
Details
Docker will stop that container but also remove the DNS entry for that node name (in this case,
yb-tserver-n4
). Subsequently, the Raft groups that still have that node as a peer and list it by hostname end up failing to resolve the address for this peer.Here is a code snippet where the failure happens:
src/yb/consensus/consensus_peers.cc:457
Another node wins the election upon the node removal - the winning leader then loops through its current RaftConfig set of peers (including the dead one, as this could be a temporary failure) and tries to setup a new Proxy to it, which goes down this path of name resolution and fails.
Thanks to @bmatican for investigating/writing up a lot of this.
The text was updated successfully, but these errors were encountered: