New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Enable allocation sampling for gperftools tcmalloc #18561
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Comments
SrivastavaAnubhav
added
kind/enhancement
This is an enhancement of an existing feature
area/docdb
YugabyteDB core features
priority/high
High Priority
labels
Aug 4, 2023
SrivastavaAnubhav
added a commit
that referenced
this issue
Aug 21, 2023
…malloc. Summary: This diff enables /pprof/heap_snapshot for gperftools tcmalloc, and makes some quality-of-life to the endpoints: - Background sampling for gperftools tcmalloc has always been on, with a default sampling frequency of 16.7 MB (https://github.com/yugabyte/gperftools/blob/f8c3aeb952fde9c79eb69cca1a5cf7e74820c127/src/sampler.cc#L80-L86). This diff changes it to follow FLAGS_profiler_sample_freq_bytes (1 MB as of the time of writing). - Add flag enable_process_lifetime_heap_**sampling** and change semantics of enable_process_lifetime_heap_**profiling**. - Before this diff, enable_process_lifetime_heap_**profiling** used to cause gperftools tcmalloc to sample every allocation (this is very expensive, and was always only for brief debugging) and google tcmalloc to sample at the rate specified by profiler_sample_freq_bytes. It used to be off by default for gperftools tcmalloc and on by default for google tcmalloc. After this diff, it is off by default for both, and has no effect on sampling for google tcmalloc. - enable_process_lifetime_heap_**sampling** is a new flag that controls whether allocations should be sampled at the rate specified by profiler_sample_freq_bytes. It is on by default for both gperftools and google tcmalloc. - The behavior for new universes using google tcmalloc is the same as before: sampling is enabled by default - The behavior for new universes using gperftools tcmalloc is different: sampling is now enabled by default - Upgrading from gperftools to google tcmalloc will enable sampling because the profiling flag will be true by default - Before this diff, downgrading from a google tcmalloc to gperftools tcmalloc would enable all-allocation sampling (since the _profiling flag was used for google tcmalloc sampling and gperftools tcmalloc all-allocation profiling). After this diff, downgrading will cause no change because the default are the same for both tcmallocs - Use abseil's symbolizer instead of glog's. This is a couple orders of magnitude faster, and should speed up yb_prof as well. - Add button for the /pprof/heap_snapshot endpoint for google and gperftools tcmalloc - Add button for the /pprof/heap endpoint for google tcmalloc only (there is no human-readable output at this endpoint for gperftools, one must use yb_prof.py) - Add order_by count / bytes option to the pprof URLs Memory tabs (gperftools tcmalloc vs google tcmalloc): {F97831} {F97891} /pprof/heap_snapshot page: {F97834} /pprof/heap page: {F97892} Jira: DB-7502 Test Plan: NB: Gperftools tcmalloc is off by default in master, so you must compile with `ybd --use_gperftools_tcmalloc --clean`. Reverted the fix in D26107 and verified that the expected stack showed up in /pprof/heap_snapshot. Also ran `ybd --cxx_test --use_gperftools_tcmalloc pprof-path-handler_util-test --gtest_filter "SamplingProfilerTest.HeapSnapshot" -n 10000` locally. When backporting to 2.14 to 2.18 (where gperftools is on by default), I will also run some perf workloads to ensure there are no regressions. Reviewers: mlillibridge, hsunder, kpopali, mbautin Reviewed By: hsunder, mbautin Subscribers: yql, ybase, bogdan, mbautin Differential Revision: https://phorge.dev.yugabyte.com/D27565
SrivastavaAnubhav
added a commit
that referenced
this issue
Sep 19, 2023
…point for gperftools tcmalloc. Summary: Original commit: f142d7f / D26585 Original commit: c456bce / D27565 We want the change setting tcmalloc sampling frequency (D26585) and the change enabling sampling (D27565) in the same diff to avoid a perf regression, which is why they are being backporting together. ----------------------------------- This diff enables /pprof/heap_snapshot for gperftools tcmalloc, and makes some quality-of-life to the endpoints: - Background sampling for gperftools tcmalloc has always been on, with a default sampling frequency of 16.7 MB (https://github.com/yugabyte/gperftools/blob/f8c3aeb952fde9c79eb69cca1a5cf7e74820c127/src/sampler.cc#L80-L86). This diff changes it to follow FLAGS_profiler_sample_freq_bytes (1 MB as of the time of writing). - Add flag enable_process_lifetime_heap_**sampling** and change semantics of enable_process_lifetime_heap_**profiling**. - Before this diff, enable_process_lifetime_heap_**profiling** used to cause gperftools tcmalloc to sample every allocation (this is very expensive, and was always only for brief debugging) and google tcmalloc to sample at the rate specified by profiler_sample_freq_bytes. It used to be off by default for gperftools tcmalloc and on by default for google tcmalloc. After this diff, it is off by default for both, and has no effect on sampling for google tcmalloc. - enable_process_lifetime_heap_**sampling** is a new flag that controls whether allocations should be sampled at the rate specified by profiler_sample_freq_bytes. It is on by default for both gperftools and google tcmalloc. - The behavior for new universes using google tcmalloc is the same as before: sampling is enabled by default - The behavior for new universes using gperftools tcmalloc is different: sampling is now enabled by default - Upgrading from gperftools to google tcmalloc will enable sampling because the profiling flag will be true by default - Before this diff, downgrading from a google tcmalloc to gperftools tcmalloc would enable all-allocation sampling (since the _profiling flag was used for google tcmalloc sampling and gperftools tcmalloc all-allocation profiling). After this diff, downgrading will cause no change because the default are the same for both tcmallocs - Use abseil's symbolizer instead of glog's. This is a couple orders of magnitude faster, and should speed up yb_prof as well. - Add button for the /pprof/heap_snapshot endpoint for google and gperftools tcmalloc - Add button for the /pprof/heap endpoint for google tcmalloc only (there is no human-readable output at this endpoint for gperftools, one must use yb_prof.py) - Add order_by count / bytes option to the pprof URLs Memory tabs: Gperftools tcmalloc: https://user-images.githubusercontent.com/17299377/269038442-36e66a36-6c34-4076-87ff-0762c2690887.png Google tcmalloc: https://user-images.githubusercontent.com/17299377/269038456-1f2edd82-1f61-4fcd-b404-ad3b9aa8853d.png /pprof/heap_snapshot page: https://user-images.githubusercontent.com/17299377/269038496-617ad08a-1c37-402e-84e2-9290c8eed00d.png /pprof/heap page: https://user-images.githubusercontent.com/17299377/269038514-e4680bac-ca8f-476d-8a02-e4a0733a6a25.png Jira: DB-6854, DB-7502 Test Plan: Reverted the fix in D26107 and verified that the expected stack showed up in /pprof/heap_snapshot. Also ran `ybd --cxx_test --use_gperftools_tcmalloc pprof-path-handler_util-test --gtest_filter "SamplingProfilerTest.HeapSnapshot" -n 10000` locally. See #17758 (comment) for perf results. Perf tests on TPCC and Sysbench workloads did not show any performance changes against 2.18.3.0-b71. I also ran Featurebench's scan workload and found no significant performance differences. Note: this workload seems to have high variance. My diff claimed there was a regression against 2.18.3.0-b73, but not against 2.18.4.0-b1, despite there being only 1 DB commit (9cf4912) in between the two, and which only affected xcluster. Reviewers: mbautin Reviewed By: mbautin Subscribers: mbautin, bogdan, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D28051
yugabyte-ci
added
priority/medium
Medium priority issue
and removed
priority/high
High Priority
labels
Sep 21, 2023
SrivastavaAnubhav
added a commit
that referenced
this issue
Oct 4, 2023
…point for gperftools tcmalloc. Summary: Original commit: f142d7f / D26585 Original commit: c456bce / D27565 ----------------------------------- Note: This diff introduces `profiler_sample_freq_bytes` with a default value of 1 MiB, and thus implicitly includes D26585 (which bumped the value up from 100 KiB). This diff also adds Abseil support, since the symbolizer is much faster. Original summary with google tcmalloc-specific parts removed since google tcmalloc is not available in 2.16 (see the original commit of D27565 for the full original summary): > This diff enables /pprof/heap_snapshot for gperftools tcmalloc, and makes some quality-of-life to the endpoints: > - Background sampling for gperftools tcmalloc has always been on, with a default sampling frequency of 16.7 MB (https://github.com/yugabyte/gperftools/blob/f8c3aeb952fde9c79eb69cca1a5cf7e74820c127/src/sampler.cc#L80-L86). This diff changes it to follow `profiler_sample_freq_bytes` (1 MB as of the time of writing). > - Add flag enable_process_lifetime_heap_**sampling**. > - enable_process_lifetime_heap_**sampling** is a new flag that controls whether allocations should be sampled at the rate specified by `profiler_sample_freq_bytes`. It is on by default for both gperftools and google tcmalloc. > - The behavior for new universes using gperftools tcmalloc is different: sampling every `profiler_sample_freq_bytes` is now enabled by default > - Use abseil's symbolizer instead of glog's. This is a couple orders of magnitude faster, and should speed up yb_prof (used for `/pprof/heap`) as well. > - Add button for the /pprof/heap_snapshot endpoint > - Add order_by count / bytes option to pprof/heap_snapshot > > Memory tabs: > Gperftools tcmalloc: https://user-images.githubusercontent.com/17299377/269038442-36e66a36-6c34-4076-87ff-0762c2690887.png > > /pprof/heap_snapshot page: > https://user-images.githubusercontent.com/17299377/269038496-617ad08a-1c37-402e-84e2-9290c8eed00d.png Jira: DB-6854, DB-7502 Test Plan: Reverted the fix in D26107 and verified that the expected stack showed up in /pprof/heap_snapshot. Also ran `ybd --cxx_test --use_gperftools_tcmalloc pprof-path-handler_util-test --gtest_filter "SamplingProfilerTest.HeapSnapshot" -n 10000` locally. Saw no significant perf impact; see #17758 (comment) for full results. Reviewers: mbautin, esheng Reviewed By: mbautin Subscribers: yql, ybase, bogdan, mbautin Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D28702
SrivastavaAnubhav
added a commit
that referenced
this issue
Oct 11, 2023
…point for gperftools tcmalloc. Summary: Original commit: f142d7f / D26585 Original commit: c456bce / D27565 ----------------------------------- Note: This diff introduces `profiler_sample_freq_bytes` with a default value of 1 MiB, and thus implicitly includes D26585 (which bumped the value up from 100 KiB). This diff also adds Abseil support (since the symbolizer is much faster) by updating thirdparty to commit `78ea75d798b43483779b760ba55e5f1e63494b0b`. Original summary with google tcmalloc-specific parts removed since google tcmalloc is not available in 2.14 (see the original commit of D27565 for the full original summary): > This diff enables /pprof/heap_snapshot for gperftools tcmalloc, and makes some quality-of-life to the endpoints: > - Background sampling for gperftools tcmalloc has always been on, with a default sampling frequency of 16.7 MB (https://github.com/yugabyte/gperftools/blob/f8c3aeb952fde9c79eb69cca1a5cf7e74820c127/src/sampler.cc#L80-L86). This diff changes it to follow `profiler_sample_freq_bytes` (1 MB as of the time of writing). > - Add flag enable_process_lifetime_heap_**sampling**. > - enable_process_lifetime_heap_**sampling** is a new flag that controls whether allocations should be sampled at the rate specified by `profiler_sample_freq_bytes`. It is on by default for both gperftools and google tcmalloc. > - The behavior for new universes using gperftools tcmalloc is different: sampling every `profiler_sample_freq_bytes` is now enabled by default > - Use abseil's symbolizer instead of glog's. This is a couple orders of magnitude faster, and should speed up yb_prof (used for `/pprof/heap`) as well. > - Add button for the /pprof/heap_snapshot endpoint > - Add order_by count / bytes option to pprof/heap_snapshot > > Memory tabs: > Gperftools tcmalloc: https://user-images.githubusercontent.com/17299377/269038442-36e66a36-6c34-4076-87ff-0762c2690887.png > > /pprof/heap_snapshot page: > https://user-images.githubusercontent.com/17299377/269038496-617ad08a-1c37-402e-84e2-9290c8eed00d.png Jira: DB-6854, DB-7502 Test Plan: Reverted the fix in D26107 and verified that the expected stack showed up in /pprof/heap_snapshot. Also ran `ybd --cxx_test --use_gperftools_tcmalloc pprof-path-handler_util-test --gtest_filter "SamplingProfilerTest.HeapSnapshot" -n 10000` locally. See #17758 (comment) for perf results. Jenkins: compile only Reviewers: mbautin, esheng Reviewed By: mbautin Subscribers: mbautin, bogdan, ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D28783
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-7502
Description
The /pprof/heap_snapshot endpoint introduced in 9554847 has proven very useful for debugging untracked memory. Gperftools also has allocation sampling (though it does not support peak heap), so it would be good to enable the endpoint for builds using gperftools tcmalloc as well.
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: