Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum not being reached on machines with identical IDs #2767

Closed
abuehrle opened this issue Feb 2, 2017 · 6 comments
Closed

Quorum not being reached on machines with identical IDs #2767

abuehrle opened this issue Feb 2, 2017 · 6 comments

Comments

@abuehrle
Copy link
Contributor

abuehrle commented Feb 2, 2017

Running the network in Kubernetes on Bare Metal. Set the IPALLOC_RANGE in the weave-daemonset.yaml with the following:

containers:
        - name: weave
          env:
          - name: IPALLOC_RANGE
            value: 192.168.0.0/16
          image: weaveworks/weave-kube:latest
          imagePullPolicy: Always

Logs say for all three containers same collision:

INFO: 2017/02/02 20:24:09.535473 ->[147.75.100.177:56831|ea:ba:c8:b5:52:f9(kube-node-2.local.lan)]: connection shutting down due to error: local "ea:ba:c8:b5:52:f9(kube-node-2.local.lan)" and remote "ea:ba:c8:b5:52:f9(kube-node-1.local.lan)" peer names collision

I can see all of the weave containers in the container view (associated with each of the three hosts) as shown below:

screen shot 2017-02-02 at 2 40 52 pm

But when I switch to the Weave Net view all I see is an error and status waiting for quorum:

screen shot 2017-02-02 at 2 57 18 pm

@bboreham
Copy link
Contributor

bboreham commented Feb 3, 2017

@abuehrle this looks like #2427, which should be fixed in the latest version which you are running.

Is it possible it's picking up persisted data from a 1.8 install?

@abuehrle
Copy link
Contributor Author

abuehrle commented Feb 3, 2017

This was the result of all nodes having the same machine-id. Fix was to run the following on each node:

rm /etc/machine-id
systemd-machine-id-setup

@abuehrle abuehrle closed this as completed Feb 3, 2017
@bboreham bboreham changed the title Quorum not being reached Quorum not being reached on machines with identical IDs Feb 6, 2017
@weitzj
Copy link

weitzj commented Feb 20, 2017

Probably interesting to see how Scaleway (Online.net) does this in their own images:

systemd-machine-id-setup does not work and will always return the same id.

https://github.com/scaleway/image-voidlinux/blob/master/overlay-image-tools/usr/local/sbin/scw-gen-machine-id

#!/bin/sh
# description "generate a unique machine id"
# author "Scaleway <opensource@scaleway.com>"

if [ -f /etc/.regen-machine-id ]
then
	uuidgen > /etc/machine-id
	rm -f /etc/.regen-machine-id
fi

@weitzj
Copy link

weitzj commented Feb 20, 2017

Actually an upgrade to the above comment:

If you look at this diff https://github.com/scaleway/image-ubuntu/commit/d33d48a7e056b1e8a16cd129411872ff743f38fe it seems like you have to rm /etc/machine-id and rm /var/lib/dbus/machine-id

@bboreham
Copy link
Contributor

Did you ever hear back from your machine provider how they expected users to deal with this, @abuehrle ?

@abuehrle
Copy link
Contributor Author

Yes, they fixed a bug in the way they were provisioning machines, so this should no longer occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants