Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AE error response not properly handled. #97

Merged
merged 2 commits into from
Oct 6, 2018

Conversation

yossigo
Copy link
Contributor

@yossigo yossigo commented Sep 27, 2018

Errors were ignored if node's match_idx==next_idx-1, I assume (??) as a
precaution against silently feeding a node that is not monotonic.

However this does not consider the case where send_appendentries()
relies on snapshot_last_idx which may be greater than the node's index.

@willemt can you think of unwanted side effects for this?

Errors were ignored if node's match_idx==next_idx-1, I assume (??) as a
precaution against silently feeding a node that is not monotonic.

However this does not consider the case where send_appendentries()
relies on snapshot_last_idx which may be greater than the node's index.
To clarify, the desired behavior is:
1) Avoid false assertions.
2) Silently ignore AE responses which are stale or not monotonic (i.e.
current index < match index).
@willemt willemt merged commit 4ddc273 into willemt:master Oct 6, 2018
@willemt
Copy link
Owner

willemt commented Oct 6, 2018

This is a good fix. It matches the semantics of the raft paper better.
Confirmed in virtraft that nothing breaks.

@tangruize
Copy link

Hello! I found this fix may cause raft_recv_appendentries_response() to send an empty retry appendentries. The main reason is that stale msgs are not all filtered out.

Consider this trace in a 3-server cluster:
S1 becomes Leader (term 1) -> S1 sends heartbeat msg m1 to S2 -> S1 restarts and become Follower -> S1 times out and starts a new election (term 2) -> S2 votes for S1 and updates term to 2 -> S1 becomes Leader again (term 2) -> S2 receives msg m1 and replies msg m2: [term:2, success: false, current_index: 0] because of smaller term -> S1 receives msg m2, but m2 is not filtered out as a stale msg, and S1 replies an empty retry appendentries.

However, this is not even a bug. I just think it is inconsistent with the semantics that the program wants to express.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants