Quorum not being reached on machines with identical IDs #2767

abuehrle · 2017-02-02T20:59:08Z

Running the network in Kubernetes on Bare Metal. Set the IPALLOC_RANGE in the weave-daemonset.yaml with the following:

containers:
        - name: weave
          env:
          - name: IPALLOC_RANGE
            value: 192.168.0.0/16
          image: weaveworks/weave-kube:latest
          imagePullPolicy: Always

Logs say for all three containers same collision:

INFO: 2017/02/02 20:24:09.535473 ->[147.75.100.177:56831|ea:ba:c8:b5:52:f9(kube-node-2.local.lan)]: connection shutting down due to error: local "ea:ba:c8:b5:52:f9(kube-node-2.local.lan)" and remote "ea:ba:c8:b5:52:f9(kube-node-1.local.lan)" peer names collision

I can see all of the weave containers in the container view (associated with each of the three hosts) as shown below:

But when I switch to the Weave Net view all I see is an error and status waiting for quorum:

The text was updated successfully, but these errors were encountered:

bboreham · 2017-02-03T14:58:08Z

@abuehrle this looks like #2427, which should be fixed in the latest version which you are running.

Is it possible it's picking up persisted data from a 1.8 install?

abuehrle · 2017-02-03T16:23:21Z

This was the result of all nodes having the same machine-id. Fix was to run the following on each node:

rm /etc/machine-id
systemd-machine-id-setup

weitzj · 2017-02-20T21:05:42Z

Probably interesting to see how Scaleway (Online.net) does this in their own images:

systemd-machine-id-setup does not work and will always return the same id.

https://github.com/scaleway/image-voidlinux/blob/master/overlay-image-tools/usr/local/sbin/scw-gen-machine-id

#!/bin/sh
# description "generate a unique machine id"
# author "Scaleway <opensource@scaleway.com>"

if [ -f /etc/.regen-machine-id ]
then
	uuidgen > /etc/machine-id
	rm -f /etc/.regen-machine-id
fi

weitzj · 2017-02-20T21:15:57Z

Actually an upgrade to the above comment:

If you look at this diff https://github.com/scaleway/image-ubuntu/commit/d33d48a7e056b1e8a16cd129411872ff743f38fe it seems like you have to rm /etc/machine-id and rm /var/lib/dbus/machine-id

bboreham · 2017-02-21T09:01:27Z

Did you ever hear back from your machine provider how they expected users to deal with this, @abuehrle ?

abuehrle · 2017-02-21T09:12:08Z

Yes, they fixed a bug in the way they were provisioning machines, so this should no longer occur.

abuehrle closed this as completed Feb 3, 2017

bboreham changed the title ~~Quorum not being reached~~ Quorum not being reached on machines with identical IDs Feb 6, 2017

bboreham added this to the n/a milestone Feb 22, 2017

bboreham added the resolution/not_our_problem label Feb 22, 2017

lzecca78 mentioned this issue Nov 20, 2019

Add machine_id value information to collectors prometheus/node_exporter#1546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quorum not being reached on machines with identical IDs #2767

Quorum not being reached on machines with identical IDs #2767

abuehrle commented Feb 2, 2017

bboreham commented Feb 3, 2017

abuehrle commented Feb 3, 2017

weitzj commented Feb 20, 2017

weitzj commented Feb 20, 2017

bboreham commented Feb 21, 2017

abuehrle commented Feb 21, 2017

Quorum not being reached on machines with identical IDs #2767

Quorum not being reached on machines with identical IDs #2767

Comments

abuehrle commented Feb 2, 2017

bboreham commented Feb 3, 2017

abuehrle commented Feb 3, 2017

weitzj commented Feb 20, 2017

weitzj commented Feb 20, 2017

bboreham commented Feb 21, 2017

abuehrle commented Feb 21, 2017