Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC message batching is too aggressive and may exceed the default memory quota #11082

Closed
overvenus opened this issue Oct 16, 2021 · 1 comment · Fixed by #11086
Closed

CDC message batching is too aggressive and may exceed the default memory quota #11082

overvenus opened this issue Oct 16, 2021 · 1 comment · Fixed by #11086
Labels
component/CDC Component: Change Data Capture type/bug Type: Issue - Confirmed a bug

Comments

@overvenus
Copy link
Member

Bug Report

What version of TiKV are you using?

v5.1.1

Steps to reproduce

1 tikv(ssd) 1 cdc(hdd) cluster.
Set tidb gc lifetime to 2400h.
Sysbench prepare 10 tables, 100GB data.
Create a changefeed start-ts = gcsafepoint+1

What did you expect?

Memory consumption should be not exceed the default memory quota(512MB).

What did happened?

CDC message batching is too aggressive and may exceed the default memory quota. It causes eventfeed RPC meltdown.
image

@overvenus overvenus added component/CDC Component: Change Data Capture type/bug Type: Issue - Confirmed a bug labels Oct 16, 2021
@github-actions github-actions bot added this to Need Triage in Question and Bug Reports Oct 16, 2021
@overvenus
Copy link
Member Author

After decreasing batch params, memory consumption is much smaller.

diff --git a/components/cdc/src/channel.rs b/components/cdc/src/channel.rs
index 56917233b..c4bb6add6 100644
--- a/components/cdc/src/channel.rs
+++ b/components/cdc/src/channel.rs
@@ -19,8 +19,9 @@ use kvproto::cdcpb::ChangeDataEvent;
 use tikv_util::{impl_display_as_debug, warn};

 use crate::service::{CdcEvent, EventBatcher};
+use crate::metrics::*;

-const CDC_MSG_MAX_BATCH_SIZE: usize = 128;
+const CDC_MSG_MAX_BATCH_SIZE: usize = 63;
 // Assume the average size of event is 1KB.
 // 2 = (CDC_MSG_MAX_BATCH_SIZE * 1KB / service::CDC_MAX_RESP_SIZE).ceil() + 1 /* reserve for ResolvedTs */;
 pub const CDC_EVENT_MAX_BATCH_SIZE: usize = 2;
diff --git a/components/cdc/src/delegate.rs b/components/cdc/src/delegate.rs
index 7da71943e..6bc33de44 100644
--- a/components/cdc/src/delegate.rs
+++ b/components/cdc/src/delegate.rs
@@ -40,7 +40,7 @@ use crate::old_value::{OldValueCache, OldValueCallback};
 use crate::service::{CdcEvent, ConnID};
 use crate::{Error, Result};

-const EVENT_MAX_SIZE: usize = 6 * 1024 * 1024; // 6MB
+const EVENT_MAX_SIZE: usize = 32 * 1024; // 6MB
 static DOWNSTREAM_ID_ALLOC: AtomicUsize = AtomicUsize::new(0);

 /// A unique identifier of a Downstream.
diff --git a/components/cdc/src/service.rs b/components/cdc/src/service.rs
index fc60b2be6..fada281e8 100644
--- a/components/cdc/src/service.rs
+++ b/components/cdc/src/service.rs
@@ -279,7 +279,7 @@ impl ChangeData for Service {
         mut sink: DuplexSink<ChangeDataEvent>,
     ) {
         // TODO explain buffer.
-        let buffer = 1024;
+        let buffer = 128;
         let (event_sink, mut event_drain) = channel(buffer, self.memory_quota.clone());
         let peer = ctx.peer();
         let conn = Conn::new(event_sink, peer);

image

Question and Bug Reports automation moved this from Need Triage to Closed(This Week) Oct 18, 2021
overvenus added a commit to ti-srebot/tikv that referenced this issue Nov 23, 2021
Cc tikv#11082

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
overvenus added a commit to ti-srebot/tikv that referenced this issue Nov 23, 2021
Cc tikv#11082

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
overvenus added a commit to ti-srebot/tikv that referenced this issue Nov 23, 2021
close tikv#11082

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
ti-chi-bot pushed a commit that referenced this issue Nov 23, 2021
close #11082

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Neil Shen <overvenus@gmail.com>
overvenus added a commit to ti-srebot/tikv that referenced this issue Dec 15, 2021
Close tikv#11082

Signed-off-by: Neil Shen <overvenus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/CDC Component: Change Data Capture type/bug Type: Issue - Confirmed a bug
Projects
Question and Bug Reports
  
Closed(This Week)
Development

Successfully merging a pull request may close this issue.

1 participant