Unexpectedly high memory consumption when using CassandraBatchTimeSeries workload #1079

bmatican · 2019-03-26T16:49:53Z

Running an in-house test of our a batch workload on YCQL, we are seeing high memory pressure, hitting the soft memory limit.

From the /mem-trackers endpoint in one of the TS:

Id | Current Consumption | Peak consumption | Limit
-- | -- | -- | --
root | 12.57G | 12.57G | 12.35G

Corresponding log messages from peers:

E0326 16:41:54.524134 14680 process_context.cc:180] SQL Error: Execution Error. Write(tablet: 2a0fc6961b134f47a1da85a84d46d0ea, num_ops: 500, num_attempts: 89, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 57676.014s (now: 57689.459s): Remote error (yb/rpc/outbound_call.cc:386): Service unavailable (yb/tserver/tablet_service.cc:239): Soft memory limit exceeded (at 101.64% of capacity)
INSERT INTO batch_ts_metrics_raw (metric_id, ts, value) VALUES (:metric_id, :ts, :value);
       ^^^^
W0326 16:41:54.525184 15093 cql_rpc.cc:271] CQL Call from xx.xx.xx.20:57000 took 73464ms. Details:
W0326 16:41:54.526329 15093 cql_rpc.cc:274] cql_details {
  type: "BATCH"
  call_details {
    sql_id: "a35b1d1d999509e2ab20a7e50d8fb5b3"
    sql_string: "INSERT INTO batch_ts_metrics_raw (metric_id, ts, value) VALUES (:metric_id, :ts, :value);"
  }
  call_details {
    sql_id: "a35b1d1d999509e2ab20a7e50d8fb5b3"
    sql_string: "INSERT INTO batch_ts_metrics_raw (metric_id, ts, value) VALUES (:metric_id, :ts, :value);"
  }
...

cc @kmuthukk

The text was updated successfully, but these errors were encountered:

Summary: When one of followers missing a lot of log entries, the following situation could happen. Leader tries sends big update request, which often times out. So leader retries to send the same request every 3 seconds. Each of those requests consumes double memory: 1) Request protobuf. 2) Serialized protobuf. So memory consumption could grow very fast to big numbers. The following issues are addressed in this diff: 1) Added mem trackers for sending and queueing serialized data. 2) Release consensus update request protobuf as soon as possible. 3) Release serialized protobufs as soon as they sent or timed out. Test Plan: Launch local cluster with: bin/yb-ctl --rf 3 create --disable_ysql Launch workload: java -jar target/yb-sample-apps.jar --workload CassandraBatchTimeseries --nodes 127.0.0.1:9042 --num_threads_read 2 --num_threads_write 2 --num_unique_keys -1 Stop one of nodes for 60 seconds, then start it back: bin/yb-ctl stop_node 3 && sleep 60 && bin/yb-ctl start_node 3 Check that memory consumption does not grow too high. Reviewers: mikhail, amitanand, bogdan Reviewed By: bogdan Subscribers: ybase, bharat Differential Revision: https://phabricator.dev.yugabyte.com/D6408

…ed to the earlier commit 33835b0 Original commit message: [#1079]: Release sending buffers as soon as possible Summary: When one of followers missing a lot of log entries, the following situation could happen. Leader tries sends big update request, which often times out. So leader retries to send the same request every 3 seconds. Each of those requests consumes double memory: 1) Request protobuf. 2) Serialized protobuf. So memory consumption could grow very fast to big numbers. The following issues are addressed in this diff: 1) Added mem trackers for sending and queueing serialized data. 2) Release consensus update request protobuf as soon as possible. 3) Release serialized protobufs as soon as they sent or timed out. Test Plan: Launch local cluster with: bin/yb-ctl --rf 3 create --disable_ysql Launch workload: java -jar target/yb-sample-apps.jar --workload CassandraBatchTimeseries --nodes 127.0.0.1:9042 --num_threads_read 2 --num_threads_write 2 --num_unique_keys -1 Stop one of nodes for 60 seconds, then start it back: bin/yb-ctl stop_node 3 && sleep 60 && bin/yb-ctl start_node 3 Check that memory consumption does not grow too high. Reviewers: mikhail, amitanand, bogdan Reviewed By: bogdan Subscribers: ybase, bharat Differential Revision: https://phabricator.dev.yugabyte.com/D6408

Summary: When one of followers missing a lot of log entries, the following situation could happen. Leader tries sends big update request, which often times out. So leader retries to send the same request every 3 seconds. Each of those requests consumes double memory: 1) Request protobuf. 2) Serialized protobuf. So memory consumption could grow very fast to big numbers. The following issues are addressed in this diff: 1) Added mem trackers for sending and queueing serialized data. 2) Release consensus update request protobuf as soon as possible. 3) Release serialized protobufs as soon as they sent or timed out. Test Plan: Launch local cluster with: bin/yb-ctl --rf 3 create --disable_ysql Launch workload: java -jar target/yb-sample-apps.jar --workload CassandraBatchTimeseries --nodes 127.0.0.1:9042 --num_threads_read 2 --num_threads_write 2 --num_unique_keys -1 Stop one of nodes for 60 seconds, then start it back: bin/yb-ctl stop_node 3 && sleep 60 && bin/yb-ctl start_node 3 Check that memory consumption does not grow too high. Reviewers: mikhail, amitanand, bogdan Reviewed By: bogdan Subscribers: ybase, bharat Differential Revision: https://phabricator.dev.yugabyte.com/D6408 Note: This commit provides additional functionality that is logically related to the earlier commit yugabyte@33835b0 and supersedes the commit yugabyte@3e89292

bmatican assigned spolitov Mar 26, 2019

bmatican added this to To Do in YBase features via automation Mar 26, 2019

kmuthukk added the area/docdb YugabyteDB core features label Mar 26, 2019

spolitov closed this as completed Apr 4, 2019

YBase features automation moved this from To Do to Done Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpectedly high memory consumption when using CassandraBatchTimeSeries workload #1079

Unexpectedly high memory consumption when using CassandraBatchTimeSeries workload #1079

bmatican commented Mar 26, 2019

Unexpectedly high memory consumption when using CassandraBatchTimeSeries workload #1079

Unexpectedly high memory consumption when using CassandraBatchTimeSeries workload #1079

Comments

bmatican commented Mar 26, 2019