Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in membership and ring consistency #112

Merged
merged 1 commit into from
Feb 23, 2016
Merged

Conversation

motiejus
Copy link
Contributor

Without this change, membership and ring could be not in sync. Scenario:
A - current node.
B - alive (according to C).
B - suspect (according to D).
C - thinks B is faulty.
D - thinks B is alive.

At the same time:

  1. C gossips A that B is faulty. A removes B to the membership list.
  2. D gossips A that B is alive. A adds B from the membership list.

Changes in the ring:
3. The lock on m.members is released, so updates on Ring due to (1) and
(2) can execute in any order. In our example, the operations are
reversed:
3.1. A adds B to the ring list due to (2).
3.2. A removes B to the ring list due to (1).

Now B is in the membership list, but not in the ring.

This is the second iteration of the change. First iteration did not
introduce a new lock, but a new invariant: a goroutine needs to lock
both memberlist and disseminator, lock memberlist first, then
disseminator. However, the invariant caused requestAdminStats to fail on
travis-ci, and I couldn't find out why.

Without this change, membership and ring could be not in sync. Scenario:
A - current node.
B - alive (according to C).
B - suspect (according to D).
C - thinks B is faulty.
D - thinks B is alive.

At the same time:
1. C gossips A that B is faulty. A removes B to the membership list.
2. D gossips A that B is alive. A adds B from the membership list.

Changes in the ring:
3. The lock on m.members is released, so updates on Ring due to (1) and
   (2) can execute in any order. In our example, the operations are
   reversed:
3.1. A adds B to the ring list due to (2).
3.2. A removes B to the ring list due to (1).

Now B is in the membership list, but not in the ring.

This is the second iteration of the change. First iteration did not
introduce a new lock, but a new invariant: a goroutine needs to lock
both memberlist and disseminator, lock memberlist first, then
disseminator. However, the invariant caused requestAdminStats to fail on
travis-ci, and I couldn't find out why.
@danielheller
Copy link

lgtm

@danielheller
Copy link

Oops, wrong button, sorry Motiejus. lgtm!

@thanodnl
Copy link
Contributor

lgtm

1 similar comment
@CorgiMan
Copy link
Contributor

lgtm

motiejus added a commit that referenced this pull request Feb 23, 2016
Fix race in membership and ring consistency
@motiejus motiejus merged commit 784d4e0 into dev Feb 23, 2016
@motiejus motiejus deleted the bug/membership-race2 branch February 23, 2016 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants