You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Shared cluster deployment fails while trying to find the vCenter address with the following message:
ERROR :Failed to deploy cluster Failed Error: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'VCenterUnreachable', message:'Post "https://vcenter01.syangsao.lab/sdk": dial tcp: lookup vcenter01.syangsao.lab on 100.64.0.10:53: no such host'
Error: exit status 1
Affected product modules (please put an X in all that apply)
SIVT APIs
SIVT UI
SIVT CLI
Docs
Expected behavior
Shared cluster installation should follow through, using the DNS server that was configured via the SIVT UI. Confirmed the DNS server on the SIVT host does find the vCenter address correctly. The same DNS server is used to confirm the AVI hostname during the initial cluster setup.
Steps to reproduce the bug
This seems to occur constantly during the shared cluster installation. Sometimes the shared cluster installation makes it through but then the same error occurs on the workload cluster installation next. Unsure on how to debug this and check why it is stating that the vCenter address is unreachable.
Version (include the SHA if the version is not obvious)
Environment where the bug was observed (vSphere+VMC, vSphere+DVS, vSphere+NSXt, etc)
SIVT host confirms DNS entry for the vCenter address is valid. Not sure why the installation fails and where to troubleshoot how it is doing the lookups
root@service13 [ ~ ]# ping vcenter01.syangsao.lab
PING vcenter01.syangsao.lab (192.168.40.14) 56(84) bytes of data.
64 bytes from vcenter01.syangsao.lab (192.168.40.14): icmp_seq=1 ttl=64 time=0.139 ms
64 bytes from vcenter01.syangsao.lab (192.168.40.14): icmp_seq=2 ttl=64 time=0.110 ms
^C
--- vcenter01.syangsao.lab ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 0.110/0.124/0.139/0.018 ms
Local resolver is configured properly on SIVT host
root@service13 [ ~ ]# resolvectl |more
Global
LLMNR setting: no
MulticastDNS setting: yes
DNSOverTLS setting: opportunistic
DNSSEC setting: no
DNSSEC supported: no
Current DNS Server: 192.168.40.2
DNS Servers: 192.168.40.2
Fallback DNS Servers: 8.8.8.8
8.8.4.4
[...]
The text was updated successfully, but these errors were encountered:
OK, so long story short, DHCP was doling out 2 DNS addresses:
192.168.40.2 *Can resolve vcenter01.syangsao.lab
192.168.1.1 **Can not resolve vcenter01.syangsao.lab, but used as a backup in case ^ fails
When the Linux host boots up on the TKG mgmt and workload subnets, it would sometimes flip the DNS addresses above. Even though the DNS server addresses are in order of having 1 then 2 for the preferred order. And so, that is why sometimes the shared cluster nodes would come up, because they had the right order and the workload clusters would fail because the order was reversed and using the DNS server that could not resolve the vCenter address.
I was able to confirm this behaviour by booting up a separate Linux host on both subnets and that is how I found this behaviour of the DNS address flipping to the wrong one. Fix is just to remove the 2nd DNS entry completely since it seems as though the Linux hosts flip these addresses from time to time and not follow the order given/setup from the DHCP end.
Bug description
Shared cluster deployment fails while trying to find the vCenter address with the following message:
ERROR :Failed to deploy cluster Failed Error: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'VCenterUnreachable', message:'Post "https://vcenter01.syangsao.lab/sdk": dial tcp: lookup vcenter01.syangsao.lab on 100.64.0.10:53: no such host'
Error: exit status 1
Affected product modules (please put an X in all that apply)
Expected behavior
Shared cluster installation should follow through, using the DNS server that was configured via the SIVT UI. Confirmed the DNS server on the SIVT host does find the vCenter address correctly. The same DNS server is used to confirm the AVI hostname during the initial cluster setup.
Steps to reproduce the bug
This seems to occur constantly during the shared cluster installation. Sometimes the shared cluster installation makes it through but then the same error occurs on the workload cluster installation next. Unsure on how to debug this and check why it is stating that the vCenter address is unreachable.
Version (include the SHA if the version is not obvious)
Environment where the bug was observed (vSphere+VMC, vSphere+DVS, vSphere+NSXt, etc)
vSphere+DVS+AVI
kubectl version
):Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9+vmware.1", GitCommit:"21eeb4527eefb360eb251addc358cea6997e8335", GitTreeState:"clean", BuildDate:"2022-05-04T00:18:36Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
Cloud provider or hardware configuration: Dell
OS (e.g. from
/etc/os-release
):NAME="VMware Photon OS"
VERSION="3.0"
ID=photon
VERSION_ID=3.0
PRETTY_NAME="VMware Photon OS/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://vmware.github.io/photon/"
BUG_REPORT_URL="https://github.com/vmware/photon/issues"
Relevant Debug Output (Logs, manifests, etc)
SIVT host confirms DNS entry for the vCenter address is valid. Not sure why the installation fails and where to troubleshoot how it is doing the lookups
DNS lookup seems to be valid from the SIVT host.
Reverse lookup is valid.
Local resolver is configured properly on SIVT host
The text was updated successfully, but these errors were encountered: