Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs mainly caused by snapshot #118

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tangruize
Copy link

@tangruize tangruize commented Aug 27, 2021

Hello! I found some catastrophic consequences mainly caused by snapshot. Maybe they are bugs or misuses of the library. If some of them are bugs, it can cause data loss, log inconsistency and progress problems.

1-6 are related to snapshot. And 7-9 are other issues. The modifications of this PR may be too much. I can divide them into small fixes in different branches.

  1. raft_recv_appendentries#L432 when ae->prev_log_idx==0, entries will be appended unconditionally. However, log may have been compacted and corresponding compacted AE entries will be appended to log again (maybe caused by a network packet duplication or a leader heartbeat). See testcase TestCluster_follower_log_no_more_than_leader_after_ae_success. This can further lead to log inconsistency, such as committed log not replicated to majority servers (see testcase TestCluster_trace_0_81007_inv_2_committed_log_replicated_majority)

  2. raft_send_appendentries#L901 AE with empty entries can be sent while it has log entries to send, causing the Follower to update the commit idx incorrectly. See testcase TestCluster_ae_should_contain_entries_if_not_synchronized. This can further lead to log inconsistency, such as committed log being rolled back and triggering code assertion. See testcase TestCluster_trace_4_63114_inv_3_committed_log_is_durable and TestCluster_trace_3_99878_assertion_set_commit_idx_le_current_idx.

  3. raft_begin_load_snapshot#L1377 log may mismatch. It is necessary to load the snapshot in this case. It causes the server lagged behind until receiving next snapshot. See testcase TestCluster_handle_snapshot_make_progress.

  4. raft_begin_load_snapshot#L1383 changing current_term directly is dangerous. It causes current term is not monotonic. See testcase TestCluster_current_term_is_monotonic. Can lead to two leaders in one term.

  5. raft_recv_appendentries#L452 log at prev_log_idx may have been compacted. In this case, compacted logs should be treated as committed logs and remaining matching logs should be appended to log. This causes performance degradation. See testcase TestCluster_ae_retry_once

  6. raft_begin_load_snapshot#L1407 memory leak found by valgrind. The pointer reference of me->nodes[0] is definitely lost if me->nodes[0] is not the me node. I noticed that @yossigo fixed this bug (PR Fix memory leak on snapshot loading. #98 ).

  7. raft_recv_appendentries_response#L317 next_idx can be decreased to equal match idx, making matched idx retransmitted. It does not seem harmful. However, it is better not to retransmit already matched logs. See testcase TestCluster_next_idx_greater_than_match_idx. I also noticed that it is introduced by a potential bug fix AE error response not properly handled. #97 . The case "a node that is not monotonic" may be caused by network failure and another issue (described in 6.)

  8. raft_send_appendentries_all#L952 it returns immediately if raft_send_appendentries returns a non-zero value. However, the return value maybe RAFT_ERR_NEEDS_SNAPSHOT. It seems that sending AE to different servers should be irrelevant. Plus 3. mentioned above, it can cause the entire cluster to fail to progress. See testcase TestCluster_handle_snapshot_make_progress. I noticed that issue raft_send_appendentries_all logic #79 also mentioned it.

  9. raft_get_last_log_term#L216 current_idx maybe a snapshot, in this condition, returning 0 causing other servers do not grant vote. If all server in the cluster's log are compacted, then no Leader can be elected. See testcase TestCluster_will_exist_leader

After fixing above issues, some tests are violated.

I find a testcase's name is TestRaft_leader_sends_appendentries_when_node_next_index_was_compacted. It seems that sending AE when next index is compacted is a required behavior? If so, it may be a misuse of the library.

I find the testcase TestRaft_leader_sends_appendentries_with_correct_prev_log_idx_when_snapshotted gets node by array index directly (Seems that nodes should not be freed). Maybe I misuses the raft_begin_load_snapshot API.

I look forward to your feedback! Thank you.

.gitignore: *.a *.so *.gcda .hypothesis *.gcov CLinkedListQueue/ tests/main_test.c tests_main python
make clean: GCOV_OUTPUT tests.c CLinkedListQueue .hypothesis
Makefile CFLAGS: add override keyword to override cmd assignment
virtraft2.py: remove namedtuple verbose argument (removed since python3.7)
test_cluster.h: virtual network, raft controller, fault injection, and invariant checking functions
test_cluster.c: testcases that violate invariants (constructed from TLA+ traces)
test_cluster_more.c: generated testcases
1. (critical) raft_recv_appendentries: in the case ae->prev_log_idx==0 and
log has been compacted, causing logs that have been compacted appended to log
again (further leading to log inconsistency, such as committed log not
replicated to majority servers).

2. raft_recv_appendentries_response: next_idx can be decreased to equal match
idx, making matched idx retransmitted

3. (critical) raft_send_appendentries: empty AE can be sent while it has log
entries to send, causing the Follower to update the commit idx incorrectly
(further leading to log inconsistency, such as committed log being rolled back)

4. (critical) raft_begin_load_snapshot: log may mismatch. It is necessary to
load the snapshot in this case. (causing the server lagged behind until
receiving next snapshot)

5. (critical) raft_begin_load_snapshot: changes current_term directly, which is
dangerous. And can lead to two leaders in one term.

6. (critical) raft_recv_appendentries: log at prev_log_idx may have been
compacted. In this case, compacted log should be treated as committed logs and
remaining matching logs should be appended to log.

7. (critical) raft_begin_load_snapshot memory leak. The pointer reference of
me->nodes[0] is definitely lost if me->nodes[0] is not the me node

8. raft_send_appendentries_all returns immediately if raft_send_appendentries
returns a non-zero value. However, the return value maybe
RAFT_ERR_NEEDS_SNAPSHOT. It seems that sending AE to different servers should
be irrelevant. (plus bug 4, it can cause the entire cluster to fail to progress)

9. (critical) raft_get_last_log_term current_idx maybe a snapshot, in this
condition, returning 0 causing other servers not grant vote. If all server in
the cluster's log are compacted, then no Leader can be elected.
test_server.c: ae.prev_log_term should be 0 if ae.prev_log_idx is 0.

TestRaft_follower_load_from_snapshot committed entry confict

TestRaft_follower_load_from_snapshot_does_not_break_cluster_safety
log mismatch

TestRaft_leader_sends_snapshot_when_node_next_index_was_compacted
should send snapshot if next idx was compacted

TestRaft_leader_not_send_appendentries_when_snapshotted
should send snapshot

TestRaft_leader_sends_snapshot_if_log_was_compacted
it is ok to send snapshot when request timeout
@tangruize tangruize changed the title Issues found by TLA+ Fix bugs mainly caused by snapshot Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant