-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] Reduce memory footprint of the per tablet Histograms #7805
Labels
Projects
Comments
amitanandaiyer
added a commit
that referenced
this issue
Apr 6, 2021
…of having Histograms (in TabletMetrics and other objects) separately for each tablet. Summary: Reduce memory footprint of the per tablet histograms. - Define Histogram Metrics at a table level MetricEntity, instead of having one each at the Tablet level. We export them to prometheus-metrics as a roll up at table level, so no loss of precision here. - Counters/Gauges are still going to be at the tablet level. Counters using incr/decr could probably be moved up as well. However guages using set_value will need to be at the tablet level. Metrics such as `is_raft_leader` do not make sense at the table level. Test Plan: Jenkins spin up local yb-ctl compare output from 127.0.0.1:9000/prometheus-metrics with and without the changes {F15979} {F15980} Space usage in-alloc bytes Before {F15984} vs after {F15983} Reviewers: bogdan, kannan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11089
Images posted on the commit essentially verify that the TabletMetrics in-use go down from 512 (i.e. # of tablets) at each call stack to 1 (# of tables created) per call stack. Before commit a4eb8ae :
After commit a4eb8ae
|
amitanandaiyer
added a commit
that referenced
this issue
Apr 17, 2021
…belonging to a table. Summary: As part of reducing the memory footprint of histograms, we want to track metrics at the table level, since we only export/store prometheus metrics rolled upto a table level. Implements rocksdb::Statistics based on Metric/MetricEntity framework, and uses a table-level MetricEntity to instantiate histograms that are to be shared across tablets. Depends on D11089 Test Plan: Jenkins Before {F16052} After {F16053} Restart before and after; and compare /prometheus-metrics output to make sure that all the metrics are exported correctly. ``` 18:02 $ curl 127.0.0.1:9000/prometheus-metrics > after2 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 178k 100 178k 0 0 2805k 0 --:--:-- --:--:-- --:--:-- 2825k dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] 18:03 $ for i in before after2 ; do cat $i | sort | sed 's/}.*/}/' > compare/$i; done dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] 18:03 $ diff compare/* dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] 18:03 $ md5sum compare/* 3b271eff87a4208f4e9f8719ce0391e5 compare/after2 3b271eff87a4208f4e9f8719ce0391e5 compare/before dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] ``` Reviewers: bogdan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11138
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
…instead of having Histograms (in TabletMetrics and other objects) separately for each tablet. Summary: Reduce memory footprint of the per tablet histograms. - Define Histogram Metrics at a table level MetricEntity, instead of having one each at the Tablet level. We export them to prometheus-metrics as a roll up at table level, so no loss of precision here. - Counters/Gauges are still going to be at the tablet level. Counters using incr/decr could probably be moved up as well. However guages using set_value will need to be at the tablet level. Metrics such as `is_raft_leader` do not make sense at the table level. Test Plan: Jenkins spin up local yb-ctl compare output from 127.0.0.1:9000/prometheus-metrics with and without the changes {F15979} {F15980} Space usage in-alloc bytes Before {F15984} vs after {F15983} Reviewers: bogdan, kannan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11089
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
…tablets belonging to a table. Summary: As part of reducing the memory footprint of histograms, we want to track metrics at the table level, since we only export/store prometheus metrics rolled upto a table level. Implements rocksdb::Statistics based on Metric/MetricEntity framework, and uses a table-level MetricEntity to instantiate histograms that are to be shared across tablets. Depends on D11089 Test Plan: Jenkins Before {F16052} After {F16053} Restart before and after; and compare /prometheus-metrics output to make sure that all the metrics are exported correctly. ``` 18:02 $ curl 127.0.0.1:9000/prometheus-metrics > after2 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 178k 100 178k 0 0 2805k 0 --:--:-- --:--:-- --:--:-- 2825k dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] 18:03 $ for i in before after2 ; do cat $i | sort | sed 's/}.*/}/' > compare/$i; done dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] 18:03 $ diff compare/* dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] 18:03 $ md5sum compare/* 3b271eff87a4208f4e9f8719ce0391e5 compare/after2 3b271eff87a4208f4e9f8719ce0391e5 compare/before dev-server-amitanand2:~/code/yugabyte-perf [:e00d18a|✚ 47…13⚑ 17] ``` Reviewers: bogdan, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11138
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@amitanandaiyer brought this up as part of the investigation in #6676
We should be able to optimize our Histogram usage, to instead of storing these per tablet, bubble these up per table. We could maybe store references to these at the TSTabletManager level and flow these into the Tablet instances, to keep most of the existing code the same.
Other notes
The text was updated successfully, but these errors were encountered: