resolve inconsistencies detected during ring merge #1962

rade · 2016-02-08T21:47:36Z

The following are resolvable inconsistencies that currently cause a ring merge to fail, which in turn causes the connection on which the "other" ring was transmitted to be dropped...

we receive an entry that splits one of our ranges, giving some of it away to another peer. This is in fact acceptable provided that we have no allocations in the range that got given away.
we receive an entry with a token and version identical to an entry we have, but holding different content. We could apply a simple tie-break here, e.g. pick the entry with the highest free count and, if that is the same, lowest peer id. The only time we should error is when this tie-breaking ends up giving away a range we own and which has allocations.
we receive an entry with a token that is equal to one of the entries we have, and has a newer version, but the entry we hold belongs to us. i.e. we've received an update to one of our own tokens, which shouldn't happen. We can, however, accept the received entry if either it belongs to a different peer and we do not have allocations in the range effectively given away, or it belongs to our own peer, in which case we should set our version to the one received plus one, effectively imposing our existing entry.

The 2nd of these is particularly important to address, at least partially, since it can cause a merge to fail on a peer for entries involving two other peers. Fixing just that would entail applying the tie-break as described and fail if that picks the received entry and the entry we hold belongs to us.

murali-reddy · 2019-03-19T10:01:10Z

I encountered a case which i think slight variation of 2nd item. On a 3 node cluster, when a node 62:e8:01:2a:9d:cc is deleted, 7e:ee:ea:63:fb:d1 reclaimed the ip range corresponding to deleted node. But c6:25:1c:4d:05:67 continue to think its owned but deleted node. Somehow versions ended to be same resulting in conflict. I guess tie-break could be to give the priority to the node is reachable.

/home/weave # ./weave --local status connections
-> 192.168.56.101:6783   failed      Inconsistent entries for 10.36.0.0: owned by 7e:ee:ea:63:fb:d1 but incoming message says 62:e8:01:2a:9d:cc, retry: 2019-03-18 10:48:13.615294509 +0000 UTC m=+4831.629477790 

/home/weave # ./weave --local status connections
-> 192.168.56.100:6783   failed      Inconsistent entries for 10.36.0.0: owned by 62:e8:01:2a:9d:cc but incoming message says 7e:ee:ea:63:fb:d1, retry: 2019-03-18 10:44:09.671593317 +0000 UTC m=+1856.869953487

        "Name": "7e:ee:ea:63:fb:d1",
        "NickName": "192.168.56.100",
            {
                "Token": "10.36.0.0",
                "Size": 1,
                "Peer": "7e:ee:ea:63:fb:d1",
                "Nickname": "192.168.56.100",
                "IsKnownPeer": true,
                "Version": 25
            },

        "Name": "c6:25:1c:4d:05:67",
        "NickName": "192.168.56.101",
            {
                "Token": "10.36.0.0",
                "Size": 1,
                "Peer": "62:e8:01:2a:9d:cc",
                "Nickname": "192.168.56.102",
                "IsKnownPeer": false,
                "Version": 25
            },

bboreham · 2019-03-19T10:18:27Z

The problem with "reachable" is two peers can have different results, in the case of a network partition.

In your example, a very interesting question is how the version number came to be incremented for the peer you say was deleted.

Fixes #1962

rade added chore [component/ipam] labels Feb 8, 2016

bboreham mentioned this issue May 5, 2016

IPAM self-preservation prevents convergence #2084

Open

This was referenced Mar 18, 2019

Nodes are attempting to claim same IP range #3310

Closed

ports expose does listen but does not have access #3615

Open

murali-reddy mentioned this issue Mar 20, 2019

Peers not deleted in 2.5.X on Kubernetes #3602

Closed

murali-reddy mentioned this issue Apr 15, 2019

Inconsistent IPAM entries with same version but with different owning peer #3632

Closed

murali-reddy added a commit that referenced this issue Apr 30, 2019

resolve inconsistencies detected during ring merge

c054015

Fixes #1962

murali-reddy mentioned this issue Apr 30, 2019

resolve inconsistencies detected during ring merge #3637

Merged

murali-reddy added a commit that referenced this issue May 3, 2019

resolve inconsistencies detected during ring merge

a400d43

Fixes #1962

murali-reddy added this to the 2.6 milestone May 14, 2019

bboreham closed this as completed in #3637 Jun 14, 2019

murali-reddy mentioned this issue Jan 27, 2020

Log shows frequent connection deleted message #3609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve inconsistencies detected during ring merge #1962

resolve inconsistencies detected during ring merge #1962

rade commented Feb 8, 2016

murali-reddy commented Mar 19, 2019

bboreham commented Mar 19, 2019

resolve inconsistencies detected during ring merge #1962

resolve inconsistencies detected during ring merge #1962

Comments

rade commented Feb 8, 2016

murali-reddy commented Mar 19, 2019

bboreham commented Mar 19, 2019