2.27.0.0-b559
tagged this
12 Sep 04:14
Summary: The connection manager is compiled and linked with Google's tcmalloc library, a high-performance memory allocator which is utilized by other core YugabyteDB processes like the Tablet Server (tserver) and Master as well. Consequently, malloc and free calls made within the connection manager are overridden by the implementation provided by tcmalloc. This change adds the capability to view tcmalloc memory statistics for the connection manager process by dumping them into the connection manager's own log files. It's done for fast debugging purpose only. **Implementation Details:** This enhancement provides the infrastructure to periodically capture tcmalloc heap snapshots. The implementation also links the necessary tcmalloc utility functions from the src/yb/util directory, making them available to the connection manager. The functionality is controlled by two new GFlags: - --ysql_conn_mgr_dump_heap_snapshot_interval: An integer flag (default: 0) that enables and configures the frequency of heap dumps. When set to a positive value N, the existing cron thread dumps the point-in-time heap snapshot of connection manager process every N seconds. The output includes estimated memory allocations and the full stack traces responsible to allocate them. - --ysql_conn_mgr_tcmalloc_sample_period: An integer flag that configures the granularity of memory profiling by setting the sampling period in bytes. This flag is only effective when heap dumping is enabled (--ysql_conn_mgr_dump_heap_snapshot_interval > 0). When active, its value defaults to 1 MB to maintain consistency with the tserver and master configurations. **Future Extension** The next planned enhancement (tracked in ticket [[ https://github.com/yugabyte/yugabyte-db/issues/28442 | #28442 ]] ) is to automatically trigger a heap snapshot dump when the connection manager hits a pre-defined memory throttling limit. This behavior will replicate the existing memory-based dumping mechanism already implemented for the tserver and master processes. **Benefits:** - Pinpoint Memory Leaks: Enables developers to correlate memory growth with specific data structures and the exact call stacks that allocated them. - Reproduce Issues: Provides consistent, detailed metrics that help in reproducing memory-related bugs in a controlled environment. - Deepen Memory Insights: Offers a clear view of how different components and data structures consume memory within the connection manager, aiding in performance tuning and optimization. **Example output in connection manager logs:** ```Estimated bytes: 124416 (121.500000KB) , estimated count: 36, sampled bytes: 6912, sampled count: 2, stack: tcmalloc::tcmalloc_internal::SampleifyAllocation<>() tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>() tcmalloc::tcmalloc_internal::slow_alloc_small<>() __libc_realloc yb_kiwi_var_push od_backend_update_parameter od_backend_ready_wait od_reset od_frontend_cleanup od_frontend mm_scheduler_main mm_context_runner Estimated bytes: 73728 (72.000000KB) , estimated count: 6, sampled bytes: 12288, sampled count: 1, stack: tcmalloc::tcmalloc_internal::SampleifyAllocation<>() tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>() tcmalloc::tcmalloc_internal::slow_alloc_small<>() __libc_calloc td_new od_router_route od_frontend mm_scheduler_main mm_context_runner ...... Total estimated bytes: 3159712 (3085.656250KB) ``` Jira: DB-18054 Test Plan: Jenkins: all tests Reviewers: skumar, asrinivasan, vikram.damle, arpit.saxena Reviewed By: arpit.saxena Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D46224