raftstore: fix high commit log duration when adding new peer #13078

Connor1996 · 2022-07-20T10:01:27Z

What is changed and how it works?

Issue Number: Close #13077

What's Changed:


When adding a new peer, `alive_cache_idx` would not consider the new peer still
in applying snapshot. Then it may trigger compacting entry cache due to 
`alive_cache_idx` being equal to `applied_idx`. After the snapshot is applied,
the log gap of new peer is not in entry cache, which triggers async fetch to 
read disk. 

Considering raft engine's read performance is not as good as rocksdb's, once 
there are a lot of Regions triggering async fetch, the process of replicating
log to new peer would be slow. If there is a conf change promoting the learner
and demoting another peer, the commit index can't be advanced in joint state
because the to-be-learner peer doesn't catch up logs in time.

Related changes

Need to cherry-pick to the release branch

Check List

Tests

Integration test
Manual test (add detailed scripts or steps below)

before

after

Release note

Fix possible QPS drop due to high commit log duration

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

ti-chi-bot · 2022-07-20T10:01:28Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

BusyJay
tabokie

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Connor1996 · 2022-07-20T10:06:24Z

PTAL @cosven

tonyxuqqi · 2022-07-20T21:58:19Z

components/raftstore/src/store/fsm/peer.rs

+                    if alive_replicated_idx > p.matched && p.matched >= truncated_idx {
+                        alive_replicated_idx = p.matched;
+                    } else if p.matched == 0 {
+                        // the new peer is still applying snapshot, do not compact cache now


what if the new peer takes a very long time applying snapshot, will the cache grow until OOM?

if that, *last_heartbeat > cache_alive_limit can't meet, and compact cache without considering the new peer

tonyxuqqi · 2022-07-20T22:00:25Z

components/raftstore/src/store/peer_storage.rs

-        let rid = self.get_region_id();
-        if self.engines.raft.has_builtin_entry_cache() {
-            self.engines.raft.gc_entry_cache(rid, idx);
-        }


why deleting this part? (e.g. what if raft uses rocksdb?)

Neither raft engine nor rocksdb has builtin entry cache now, it's deprecated.

cosven · 2022-07-21T02:16:12Z

components/raftstore/src/store/fsm/peer.rs

@@ -4958,18 +4963,14 @@ where
        self.fsm
            .peer
            .mut_store()
-            .maybe_gc_cache(alive_cache_idx, applied_idx);
+            .compact_cache_to(alive_replicated_idx + 1);


I have a similar question. Will the cache costs too much memory? Since it was eagerly cleaned before and now it is not.

No, if there is a large log lag, force raft log compact will be triggered, and the cache compact is performed as well. Check the mut_store().compact_to() in on_ready_compact_log

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

BusyJay · 2022-07-21T05:15:00Z

components/raftstore/src/store/fsm/peer.rs

+                if *last_heartbeat > cache_alive_limit {
+                    if alive_replicated_idx > p.matched && p.matched >= truncated_idx {
+                        alive_replicated_idx = p.matched;
+                    } else if p.matched == 0 {


Why checks for 0?

Please check the pr description, that's the reason why the entry cache is dropped mistakenly

Better split the commit messages by 80 characters.

What if matched is not 0 but less than truncated_idx?

if it's less than truncated_idx, the entry cache must be already dropped by on_ready_compact_log, so no need to consider it. Seems should keep the name alive_cache_idx.

For example, a node is always lagging behind, so leader will wait for its first snapshot, and then skip all following snapshot?

Yes. If that's the case, we should adjust the force compact policy to consider on flight snapshot. Nothing can do here as the cache is dropped in on_ready_compact_log anyway.

BusyJay · 2022-07-21T05:17:51Z

components/raftstore/src/store/peer_storage.rs

-            let rid = self.get_region_id();
-            self.engines.raft.gc_entry_cache(rid, apply_idx + 1);
-        }
-        if replicated_idx == apply_idx {


This is still necessary.

compact cache to min(alive_cache_idx, applied_idx) in latest commit, seems better than this

alive_cached_idx may be accessed again to find the match entry.

what do you mean, I don't get it

replicated_idx + 1 may not be a good index.

I don't ever use replicated_idx + 1... I still don't know what's your point. For min(alive_cache_idx, applied_idx), it already covers the case when the region is inactive.

BusyJay · 2022-07-21T05:18:02Z

components/raftstore/src/store/peer_storage.rs

-            Some(idx) => idx,
-        };
-        if cache_first_idx > replicated_idx + 1 {
-            // Catching up log requires accessing fs already, let's optimize for


This is still necessary.

no, it makes thing worse once the cache is dropped by mistake. alive_cache_idx and force compact already exclude the too lagging peer, the policy is better than this

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

tests/failpoints/cases/test_async_fetch.rs

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 · 2022-07-21T07:44:47Z

/merge

ti-chi-bot · 2022-07-21T07:44:48Z

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-07-21T07:44:50Z

This pull request has been accepted and is ready to merge.

Commit hash: f370239

tabokie · 2022-07-21T08:00:31Z

/run-tests retry=4

tabokie · 2022-07-21T08:36:50Z

/run-tests

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

ti-srebot · 2022-07-21T08:43:31Z

cherry pick to release-6.1 in PR #13089

…#13089) close #13077, ref #13078 When adding a new peer, `alive_cache_idx` would not consider the new peer still in applying snapshot. Then it may trigger compacting entry cache due to `alive_cache_idx` being equal to `applied_idx`. After the snapshot is applied, the log gap of new peer is not in entry cache, which triggers async fetch to read disk. Considering raft engine's read performance is not as good as rocksdb's, once there are a lot of Regions triggering async fetch, the process of replicating log to new peer would be slow. If there is a conf change promoting the learner and demoting another peer, the commit index can't be advanced in joint state because the to-be-learner peer doesn't catch up logs in time. Signed-off-by: ti-srebot <ti-srebot@pingcap.com> Co-authored-by: Connor <zbk602423539@gmail.com>

) close tikv#13077 When adding a new peer, `alive_cache_idx` would not consider the new peer still in applying snapshot. Then it may trigger compacting entry cache due to `alive_cache_idx` being equal to `applied_idx`. After the snapshot is applied, the log gap of new peer is not in entry cache, which triggers async fetch to read disk. Considering raft engine's read performance is not as good as rocksdb's, once there are a lot of Regions triggering async fetch, the process of replicating log to new peer would be slow. If there is a conf change promoting the learner and demoting another peer, the commit index can't be advanced in joint state because the to-be-learner peer doesn't catch up logs in time. Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

…13120) ref #13060, ref #13078 In some cases, such as the one mentioned in #13078, the commit log duration became high. In the case, the needed log is not in entry cache and there are many raftlog async fetch tasks. This commit adds a log to show the cache first index and peers' progress when there is any long uncommitted proposal. It also adds a metric to show the duration of the async fetch tasks. Signed-off-by: cosven <yinshaowen241@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

Connor1996 added 2 commits July 20, 2022 17:33

add test

f0d773a

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

adjust compact cache

aadd921

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

ti-chi-bot added release-note do-not-merge/needs-triage-completed size/L labels Jul 20, 2022

Connor1996 requested review from BusyJay and tabokie July 20, 2022 10:04

ti-chi-bot added needs-cherry-pick-release-6.0 needs-cherry-pick-release-6.1 and removed do-not-merge/needs-triage-completed labels Jul 20, 2022

Connor1996 mentioned this pull request Jul 20, 2022

Enhance fetch raft log policy to aviod QPS spike #13060

Open

5 tasks

Connor1996 removed the needs-cherry-pick-release-6.0 label Jul 20, 2022

tonyxuqqi reviewed Jul 20, 2022

View reviewed changes

cosven reviewed Jul 21, 2022

View reviewed changes

make format

7bed63c

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

ti-chi-bot added the needs-cherry-pick-release-6.0 label Jul 21, 2022

BusyJay reviewed Jul 21, 2022

View reviewed changes

address comment

85027c0

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

cosven reviewed Jul 21, 2022

View reviewed changes

tests/failpoints/cases/test_async_fetch.rs Outdated Show resolved Hide resolved

Connor1996 added 3 commits July 21, 2022 14:29

renaming

6d2c7da

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

add comment

7662f16

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

adjust idx

f370239

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

BusyJay approved these changes Jul 21, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Jul 21, 2022

tabokie approved these changes Jul 21, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Jul 21, 2022

ti-chi-bot added the status/can-merge Status: Can merge to base branch label Jul 21, 2022

Merge branch 'master' into entry-cache-gc

ff69c47

Connor1996 removed the needs-cherry-pick-release-6.0 label Jul 21, 2022

Connor1996 changed the title ~~raftstore: Fix entry cache gc may dropped by mistake when adding new peer~~ raftstore: Fix high commit log duration because entry cache gc dropped by mistake when adding new peer Jul 21, 2022

ti-chi-bot added the needs-cherry-pick-release-6.0 label Jul 21, 2022

Connor1996 removed the needs-cherry-pick-release-6.0 label Jul 21, 2022

Merge branch 'master' into entry-cache-gc

3c9b4dd

ti-chi-bot added needs-cherry-pick-release-6.0 needs-cherry-pick-release-6.2 labels Jul 21, 2022

BusyJay changed the title ~~raftstore: Fix high commit log duration because entry cache gc dropped by mistake when adding new peer~~ raftstore: fix high commit log duration when adding new peer Jul 21, 2022

Connor1996 removed needs-cherry-pick-release-6.0 needs-cherry-pick-release-6.2 labels Jul 21, 2022

ti-chi-bot merged commit 1f0a1a3 into tikv:master Jul 21, 2022

ti-chi-bot added this to the Pool milestone Jul 21, 2022

ti-srebot pushed a commit to ti-srebot/tikv that referenced this pull request Jul 21, 2022

cherry pick tikv#13078 to release-6.1

e233157

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

ti-srebot mentioned this pull request Jul 21, 2022

raftstore: fix high commit log duration when adding new peer (#13078) #13089

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raftstore: fix high commit log duration when adding new peer #13078

raftstore: fix high commit log duration when adding new peer #13078

Connor1996 commented Jul 20, 2022 •

edited

ti-chi-bot commented Jul 20, 2022 •

edited

Connor1996 commented Jul 20, 2022

tonyxuqqi Jul 20, 2022

Connor1996 Jul 21, 2022

tonyxuqqi Jul 20, 2022

Connor1996 Jul 21, 2022

cosven Jul 21, 2022 •

edited

Connor1996 Jul 21, 2022

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022

BusyJay Jul 21, 2022

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022 •

edited

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022

BusyJay Jul 21, 2022

Connor1996 Jul 21, 2022 •

edited

Connor1996 commented Jul 21, 2022

ti-chi-bot commented Jul 21, 2022

ti-chi-bot commented Jul 21, 2022

tabokie commented Jul 21, 2022

tabokie commented Jul 21, 2022

ti-srebot commented Jul 21, 2022

raftstore: fix high commit log duration when adding new peer #13078

raftstore: fix high commit log duration when adding new peer #13078

Conversation

Connor1996 commented Jul 20, 2022 • edited

What is changed and how it works?

Related changes

Check List

Release note

ti-chi-bot commented Jul 20, 2022 • edited

Connor1996 commented Jul 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cosven Jul 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Connor1996 Jul 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Connor1996 Jul 21, 2022 • edited

Choose a reason for hiding this comment

Connor1996 commented Jul 21, 2022

ti-chi-bot commented Jul 21, 2022

ti-chi-bot commented Jul 21, 2022

tabokie commented Jul 21, 2022

tabokie commented Jul 21, 2022

ti-srebot commented Jul 21, 2022

Connor1996 commented Jul 20, 2022 •

edited

ti-chi-bot commented Jul 20, 2022 •

edited

cosven Jul 21, 2022 •

edited

Connor1996 Jul 21, 2022 •

edited

Connor1996 Jul 21, 2022 •

edited