Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenShift: NOK DNS causes FAILED - RETRYING: Wait for control plane pods to appear (... retries left) #4

Open
vorburger opened this issue Dec 10, 2018 · 5 comments

Comments

@vorburger
Copy link
Owner

Gets stuck at this during 40' (then I gave up and Ctrl-C it) :

Monday 10 December 2018  14:51:58 +0000 (0:00:00.044)       0:04:35.053 ******* 
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (58 retries left).
...

Need to dig what the root cause of this is ...

@vorburger
Copy link
Owner Author

vorburger commented Dec 10, 2018

[centos@openshift-master ~]$ sudo docker images
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
docker.io/openshift/origin-node            v3.11.0             09155f3d6e1c        4 days ago          1.16 GB
docker.io/openshift/origin-control-plane   v3.11.0             571bf0129014        4 days ago          825 MB
docker.io/openshift/origin-pod             v3.11.0             842871e974c0        4 days ago          258 MB
quay.io/coreos/etcd                        v3.2.22             ff5dd2137a4f        6 months ago        37.3 MB

[centos@openshift-master ~]$ sudo docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
1dead8a3ba40        571bf0129014                             "/bin/bash -c '#!/..."   2 hours ago         Up 2 hours                              k8s_controllers_master-controllers-openshift-master.rdocloud_kube-system_65900fff2b6e1f76c768d747ff1e53f6_0
698b4e1af0b1        docker.io/openshift/origin-pod:v3.11.0   "/usr/bin/pod"           2 hours ago         Up 2 hours                              k8s_POD_master-etcd-openshift-master.rdocloud_kube-system_f577aa512ca7d68d2d4318b8a7884993_0
ce30cb6bb3d3        docker.io/openshift/origin-pod:v3.11.0   "/usr/bin/pod"           2 hours ago         Up 2 hours                              k8s_POD_master-controllers-openshift-master.rdocloud_kube-system_65900fff2b6e1f76c768d747ff1e53f6_0
04047bfde971        docker.io/openshift/origin-pod:v3.11.0   "/usr/bin/pod"           2 hours ago         Up 2 hours                              k8s_POD_master-api-openshift-master.rdocloud_kube-system_60f548cd1d82d290eb6882da121098d3_0

sudo docker logs -f --tail 100 1dead8a3ba40
E1210 17:14:51.827391       1 leaderelection.go:234] error retrieving resource lock kube-system/kube-controller-manager: Get https://openshift-master.rdocloud:8443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager: dial tcp 198.105.244.11:8443: i/o timeout
E1210 17:14:54.922683       1 reflector.go:136] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ReplicationController: Get https://openshift-master.rdocloud:8443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 198.105.254.11:8443: i/o timeout

It looks like it's expecting to have working DNS for master and node hostnames?

In OpenStack VMs out of the box there is no internal DNS for VM names.

@vorburger
Copy link
Owner Author

In OpenStack VMs out of the box there is no internal DNS for VM names.

Actually it's a bit more interesing than that... check this out:

[centos@openshift-master ~]$ cat /etc/resolv.conf
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local rdocloud
nameserver 192.168.0.11

[centos@openshift-master ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 192.168.0.11  netmask 255.255.255.0  broadcast 192.168.0.255

[centos@openshift-master ~]$ sudo yum install -y bind-utils
[centos@openshift-master ~]$ nslookup openshift-master.rdocloud
Server:		192.168.0.11
Address:	192.168.0.11#53

Non-authoritative answer:
Name:	openshift-master.rdocloud
Address: 198.105.244.11
Name:	openshift-master.rdocloud
Address: 198.105.254.11

[centos@openshift-master ~]$ ping 192.168.0.11
PING 192.168.0.11 (192.168.0.11) 56(84) bytes of data.
64 bytes from 192.168.0.11: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 192.168.0.11: icmp_seq=2 ttl=64 time=0.039 ms
^C
--- 192.168.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms

[centos@openshift-master ~]$ ping 198.105.254.11
PING 198.105.254.11 (198.105.254.11) 56(84) bytes of data.
^C
--- 198.105.254.11 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms

So there is DNS, but for itself it returns a weird 198.105.244.11 when it should be 192.168.0.11 ?

https://docs.okd.io/latest/install/prerequisites.html#prereq-dns partially sheds some light on the background.

@vorburger
Copy link
Owner Author

vorburger commented Dec 10, 2018

So there is DNS, but for itself it returns a weird 198.105.244.11 when it should be 192.168.0.11 ?

This actually isn't really an OpenShift(-Ansible) installation issue at all, the gist of this basically is just that even a simple ping `hostname` does not work like one would expect it should, in a VM on the RDO Cloud.

I've raised this tickets.osci.io #1172 but am at the same time attempting to work around it with a hack.

vorburger added a commit that referenced this issue Dec 10, 2018
@vorburger
Copy link
Owner Author

at the same time attempting to work around it with a hack

ee560bf adds something like the below to a ose-dnsmasq.conf file (currently test-ose-dnsmasq.conf, will rename) which via a reference to it from a openshift_node_dnsmasq_additional_config_file in the [OSEv3:vars] of /etc/ansible/hosts ends up in /etc/dnsmasq.d/openshift-ansible.conf (NOT /etc/dnsmasq.conf nor /etc/dnsmasq.d/origin-dns.conf) and this does the trick.

host-record=openshift-master.rdocloud,openshift-master,192.168.0.11
host-record=openshift-node1.rdocloud,openshift-node1,192.168.0.24
host-record=openshift-node2.rdocloud,openshift-node2,192.168.0.16

@vorburger vorburger changed the title FAILED - RETRYING: Wait for control plane pods to appear (... retries left) NOK DNS causes FAILED - RETRYING: Wait for control plane pods to appear (... retries left) Dec 12, 2018
@vorburger vorburger changed the title NOK DNS causes FAILED - RETRYING: Wait for control plane pods to appear (... retries left) OpenShift: NOK DNS causes FAILED - RETRYING: Wait for control plane pods to appear (... retries left) Dec 18, 2018
@vinodmsharma
Copy link

We can follow below steps as well to enable the upstream DNS servers to resolve Hosts:

You can configure multiple upstream DNS servers through NetworkManager.
For example, If there are Primary DNS server: 192.168.68.68 and Secondary DNS server: 192.168.68.69, then you can configure as follows.

# nmcli con mod eth0 ipv4.dns 192.168.68.68,192.168.68.69
# systemctl restart NetworkManager
# systemctl restart dnsmasq
# cat /etc/dnsmasq.d/origin-upstream-dns.conf
server=192.168.68.68
server=192.168.68.69

Please refer to https://access.redhat.com/solutions/3609281 for more info..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants