Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lower consensus_max_batch_size_bytes default further #2739

Closed
kmuthukk opened this issue Oct 26, 2019 · 1 comment
Closed

lower consensus_max_batch_size_bytes default further #2739

kmuthukk opened this issue Oct 26, 2019 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features priority/high High Priority

Comments

@kmuthukk
Copy link
Collaborator

When a TServer restarts (say after 1 minute) all tablets on that TServer try to catch up from their leaders. The UpdateConsensus RPCs with default 32MB from the leaders happening concurrently for 100s of tablets can cause unnecessary memory spikes, timeouts and wasteful retries. Setting this to a more conservative value like 2MB should be good enough in practice, and help in the near-term till additional/better auto-throttling is put in.

@kmuthukk kmuthukk added area/docdb YugabyteDB core features priority/high High Priority labels Oct 26, 2019
@ttyusupov ttyusupov added this to To Do in YBase features via automation Oct 29, 2019
@ttyusupov ttyusupov moved this from To Do to In progress in YBase features Oct 29, 2019
ttyusupov added a commit that referenced this issue Nov 8, 2019
Summary: When a TServer restarts (say after 1 minute) all tablets on that TServer try to catch up from their leaders. The UpdateConsensus RPCs with default 32MB from the leaders happening concurrently for 100s of tablets can cause unnecessary memory spikes, timeouts and wasteful retries. Setting this to a more conservative value like 2MB should be good enough in practice, and help in the near-term till additional/better auto-throttling is put in.

Test Plan:
- Did 4-5 ptest runs for each 2MB/32MB. The difference in results is mostly within std dev between same-build runs. And CassandraBatchTimeseries_w96_r0 is slightly better (almost within std dev) for 2mb max batch size. Average is 410kops/sec for 2mb and 372kops/sec for 32mb. Std dev is 33/37kops/sec.
- With 2MB max batch size, for 48 follower tablet peers maximum live calls tracked memory usage after one-minute downtime is 128MB for calls raw data + 286MB for calls parsed params, 414MB total.

Reviewers: sergei, mikhail, bogdan, kannan

Reviewed By: kannan

Subscribers: rao, kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7475
@ttyusupov ttyusupov moved this from In progress to Done in YBase features Nov 8, 2019
@bmatican
Copy link
Contributor

Default is now 4mb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features priority/high High Priority
Projects
Development

No branches or pull requests

3 participants