Skip to content

2.27.0.0-b559

tagged this 12 Sep 04:14
Summary:
The connection manager is compiled and linked with Google's tcmalloc library, a high-performance memory allocator which is utilized by other core YugabyteDB processes like the Tablet Server (tserver) and Master as well. Consequently, malloc and free calls made within the connection manager are overridden by the implementation provided by tcmalloc.

This change adds the capability to view tcmalloc memory statistics for the connection manager process by dumping them into the connection manager's own log files. It's done for fast debugging purpose only.

**Implementation Details:**
This enhancement provides the infrastructure to periodically capture tcmalloc heap snapshots. The implementation also links the necessary tcmalloc utility functions from the src/yb/util directory, making them available to the connection manager.

The functionality is controlled by two new GFlags:

 - --ysql_conn_mgr_dump_heap_snapshot_interval: An integer flag (default: 0) that enables and configures the frequency of heap dumps. When set to a positive value N, the existing cron thread dumps the point-in-time heap snapshot of connection manager process every N seconds. The output includes estimated memory allocations and the full stack traces responsible to allocate them.

 - --ysql_conn_mgr_tcmalloc_sample_period: An integer flag that configures the granularity of memory profiling by setting the sampling period in bytes. This flag is only effective when heap dumping is enabled (--ysql_conn_mgr_dump_heap_snapshot_interval > 0). When active, its value defaults to 1 MB to maintain consistency with the tserver and master configurations.

**Future Extension**
The next planned enhancement (tracked in ticket [[ https://github.com/yugabyte/yugabyte-db/issues/28442 | #28442 ]] ) is to automatically trigger a heap snapshot dump when the connection manager hits a pre-defined memory throttling limit. This behavior will replicate the existing memory-based dumping mechanism already implemented for the tserver and master processes.

**Benefits:**
- Pinpoint Memory Leaks: Enables developers to correlate memory growth with specific data structures and the exact call stacks that allocated them.
- Reproduce Issues: Provides consistent, detailed metrics that help in reproducing memory-related bugs in a controlled environment.
- Deepen Memory Insights: Offers a clear view of how different components and data structures consume memory within the connection manager, aiding in performance tuning and optimization.

**Example output in connection manager logs:**

```Estimated bytes: 124416 (121.500000KB) , estimated count: 36, sampled bytes: 6912, sampled count: 2, stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()
tcmalloc::tcmalloc_internal::slow_alloc_small<>()
__libc_realloc
yb_kiwi_var_push
od_backend_update_parameter
od_backend_ready_wait
od_reset
od_frontend_cleanup
od_frontend
mm_scheduler_main
mm_context_runner

Estimated bytes: 73728 (72.000000KB) , estimated count: 6, sampled bytes: 12288, sampled count: 1, stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()
tcmalloc::tcmalloc_internal::slow_alloc_small<>()
__libc_calloc
td_new
od_router_route
od_frontend
mm_scheduler_main
mm_context_runner

......

Total estimated bytes: 3159712 (3085.656250KB)

```
Jira: DB-18054

Test Plan: Jenkins: all tests

Reviewers: skumar, asrinivasan, vikram.damle, arpit.saxena

Reviewed By: arpit.saxena

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D46224
Assets 2
Loading