Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: Remove future_poller pool and batch Ticks #8457

Merged
merged 22 commits into from Sep 9, 2020

Conversation

Little-Wallace
Copy link
Contributor

Signed-off-by: Little-Wallace bupt2013211450@gmail.com

What problem does this PR solve?

To reduce the cost of TiKV heartbeat tick.

What is changed and how it works?

Batch all ticks in once poll together and call them in delay thread rather than future-poller.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • PR to update pingcap/tidb-ansible:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Integration test

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

  • No release note.

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace
Copy link
Contributor Author

I tested it in a large cluster which has 25K regions. There is result ( I update cluster binary in 13:40):
image

@Little-Wallace Little-Wallace changed the title Remove future_poller pool and batch Ticks raftstore: Remove future_poller pool and batch Ticks Aug 17, 2020
@Little-Wallace Little-Wallace added the sig/raft Component: Raft, RaftStore, etc. label Aug 17, 2020
@@ -691,6 +707,37 @@ impl<EK: KvEngine, ER: RaftEngine, T: Transport, C: PdClient> RaftPoller<EK, ER,
self.poll_ctx.raft_metrics.ready.snapshot - self.previous_metrics.ready.snapshot
);
}

fn flush_ticks(&mut self) {
const TICKS: &[PeerTicks] = &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use static and add a test to check if all types of PeerTicks has been included.

});
let idx = tick.bits() as usize;
let batch = &mut self.ctx.tick_batch[idx];
if batch.ticks.is_empty() || batch.wait_duration == timeout {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can batch all of the timer and combine the same timeout timer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the timeout of most ticks are different...

@@ -691,6 +707,37 @@ impl<EK: KvEngine, ER: RaftEngine, T: Transport, C: PdClient> RaftPoller<EK, ER,
self.poll_ctx.raft_metrics.ready.snapshot - self.previous_metrics.ready.snapshot
);
}

fn flush_ticks(&mut self) {
const TICKS: &[PeerTicks] = &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a method to PeerTicks to return all ticks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@BusyJay BusyJay requested a review from NingLin-P August 26, 2020 07:29
#[derive(Default)]
pub struct PeerTickBatch {
pub ticks: Vec<Box<dyn FnOnce() + Send>>,
pub wait_duration: Duration,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove it, for the same tick the timeout should be the same, we can get the timeout from config when flushing ticks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace Little-Wallace added needs-cherry-pick-release-4.0 Type: Need cherry pick to release 4.0 needs-cherry-pick-release-3.0 Type: Need cherry pick to release 3.0 labels Aug 29, 2020
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace
Copy link
Contributor Author

/test

@@ -374,6 +390,21 @@ where
}
timeout
}

pub fn update_ticks_timeout(&mut self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should call this function after config updated at

// update config
self.poll_ctx.perf_context_statistics.start();
if let Some(incoming) = self.cfg_tracker.any_new() {
match Ord::cmp(
&incoming.messages_per_tick,
&self.poll_ctx.cfg.messages_per_tick,
) {
CmpOrdering::Greater => {
self.store_msg_buf.reserve(incoming.messages_per_tick);
self.peer_msg_buf.reserve(incoming.messages_per_tick);
self.messages_per_tick = incoming.messages_per_tick;
}
CmpOrdering::Less => {
self.store_msg_buf.shrink_to(incoming.messages_per_tick);
self.peer_msg_buf.shrink_to(incoming.messages_per_tick);
self.messages_per_tick = incoming.messages_per_tick;
}
_ => {}
}
self.poll_ctx.cfg = incoming.clone();
}

or instead of maintaining wait_duration, we can get the up-to-date config from self.cfg through a function like fn get_wait_duration(&self, tick: PeerTicks) -> Duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@codecov
Copy link

codecov bot commented Aug 31, 2020

Codecov Report

Merging #8457 into master will increase coverage by 0.09%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8457      +/-   ##
==========================================
+ Coverage   84.45%   84.55%   +0.09%     
==========================================
  Files         596      590       -6     
  Lines      145303   143853    -1450     
==========================================
- Hits       122714   121629    -1085     
+ Misses      22589    22224     -365     
Impacted Files Coverage Δ
components/pd_client/src/errors.rs 0.00% <0.00%> (-38.47%) ⬇️
components/error_code/src/raftstore.rs 0.00% <0.00%> (-21.06%) ⬇️
components/sst_importer/src/errors.rs 28.26% <0.00%> (-19.57%) ⬇️
components/engine_traits/src/errors.rs 0.00% <0.00%> (-18.19%) ⬇️
components/pd_client/src/client.rs 71.42% <0.00%> (-17.49%) ⬇️
components/txn_types/src/lib.rs 21.62% <0.00%> (-13.52%) ⬇️
components/raftstore/src/router.rs 78.49% <0.00%> (-11.41%) ⬇️
components/test_raftstore/src/router.rs 55.55% <0.00%> (-7.31%) ⬇️
src/storage/txn/mod.rs 28.57% <0.00%> (-6.50%) ⬇️
src/storage/kv/cursor.rs 73.44% <0.00%> (-5.44%) ⬇️
... and 157 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5496d07...2bd9257. Read the comment docs.

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Copy link
Member

@NingLin-P NingLin-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot
Copy link
Contributor

@NingLin-P,Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments.See the corresponding SIG page for more information. Related SIG: raft(slack).

@Little-Wallace
Copy link
Contributor Author

/test

1 similar comment
@Little-Wallace
Copy link
Contributor Author

/test

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace Little-Wallace removed the needs-cherry-pick-release-3.0 Type: Need cherry pick to release 3.0 label Sep 9, 2020
@@ -158,8 +158,6 @@ fn test_tick_after_destroy() {
let engine_3 = cluster.get_engine(3);
must_get_equal(&engine_3, b"k1", b"v1");

let tick_fp = "on_raft_log_gc_tick_1";
fail::cfg(tick_fp, "pause").unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this failpoint to function on_raft_gc_log_tick and use return not pause then set raft_log_gc_tick_interval to 50ms in this test.

@@ -176,9 +176,9 @@ fn test_tick_after_destroy() {
cluster.must_put(b"k3", b"v3");

thread::sleep(cluster.cfg.raft_store.raft_log_gc_tick_interval.0);
fail::remove(tick_fp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order should not be changed.

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Copy link
Member

@gengliqi gengliqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Status: PR - There is already 1 approval label Sep 9, 2020
@Little-Wallace Little-Wallace added the status/can-merge Status: Can merge to base branch label Sep 9, 2020
@ti-srebot
Copy link
Contributor

@Little-Wallace Oops! auto merge is restricted to Committers of the SIG.See the corresponding SIG page for more information. Related SIG: raft(slack).

@gengliqi
Copy link
Member

gengliqi commented Sep 9, 2020

/merge

@ti-srebot
Copy link
Contributor

@gengliqi Oops! This PR requires at least 2 LGTMs to merge. The current number of LGTM is 1.

@gengliqi gengliqi merged commit 03c3ec6 into tikv:master Sep 9, 2020
ti-srebot pushed a commit to ti-srebot/tikv that referenced this pull request Sep 9, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #8634

Little-Wallace pushed a commit that referenced this pull request Sep 11, 2020
* cherry pick #8457 to release-4.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
ldeng-ustc pushed a commit to ldeng-ustc/tikv that referenced this pull request Sep 12, 2020
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace Little-Wallace deleted the heartbeat branch November 23, 2021 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-4.0 Type: Need cherry pick to release 4.0 sig/raft Component: Raft, RaftStore, etc. status/can-merge Status: Can merge to base branch status/LGT1 Status: PR - There is already 1 approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants