Summary:
### Motivation
We currently have the ability to get the PG heap snapshot on-demand, both through YSQL (yb_backend_heap_snapshot/yb_backend_heap_snapshot_peak) and through the logs (yb_log_backend_heap_snapshot/yb_log_backend_heap_snapshot_peak). However, both of these methods require connecting to the cluster while the target backend is still running.
### Solution
In order to facilitate easier debugging, this diff adds the ability to tell selected backends to dump their heap snapshot to logs when they exit. This is useful for when there are memory issues on short-lived backends or they occur on an infrequent basis.
This diff adds a flag that, when enabled, tells the given backend to dump its peak heap snapshot to logs when it exits. This is a snapshot of all of the call stacks where `malloc`/`new` was called as well as the total amount of memory allocated by each call stack, when the process was at its peak memory usage.
### Interface
This diff adds both a GUC, `yb_log_heap_snapshot_on_exit_threshold` and a T-server gflag wrapper (`yb_log_heap_snapshot_on_exit_threshold`). Both are integer-typed and are off (`-1`) by default. The GUC is `PGC_USERSET` and the flag is a runtime gflag, allowing users to change this variable at any point during the backend's running. Similarly, the T-server flag is runtime, allowing users to change the flag at any point and have it take effect immediately.
The GUC has `GUC_UNIT_KB`, so you can set it using human-readable numbers. For example: `SET yb_log_heap_snapshot_on_exit_threshold='1MB';`
### Implementation
During Postgres process startup, an `on_proc_exit` hook is installed, regardless of whether the flag is set. When a process exits, this hook is invoked, and it gets the process's peak RSS from the OS via `getrusage`. If the peak RSS is greater than or equal to `yb_log_heap_snapshot_on_exit_threshold`, it dumps the heap snapshot to PG logs. If the flag is turned off, the hook is a no-op.
### Misc changes
Refactor repeated code to get the peak RSS usage from the OS into a separate function and move it to `pg_yb_utils.c`
Jira: DB-17035
Test Plan:
Manual test
1. Using the GUC:
```
./yb_build.sh release --sj
bin/yb-ctl wipe_restart
bin/ysqlsh
yugabyte=# SET yb_log_heap_snapshot_on_exit_threshold = '1024 kB';
yugabyte=# \q
```
Observe PG logs. See sample output below.
2. Using the T-server flag:
```
./yb_build.sh release --sj
bin/yb-ctl wipe_restart --tserver_flags ysql_yb_log_heap_snapshot_on_exit=10240
bin/ysqlsh
yugabyte=# \q
```
Observe PG logs:
```
025-06-04 05:51:47.629 UTC [2113950] LOG: Peak heap snapshot of PID 2113950 (peak RSS: 51748 KB, threshold: 10000 KB):
I0604 05:51:47.629959 2113950 tcmalloc_profile.cc:145] Analyzing TCMalloc sampling profile
W0604 05:51:47.637796 2113950 tcmalloc_profile.cc:194] Failed to symbolize 11 symbols
I0604 05:51:47.637849 2113950 ybc_pggate.cc:778] Heap Profile:
I0604 05:51:47.637856 2113950 ybc_pggate.cc:782] estimated bytes: 2,113,536, estimated count: 2, sampled_allocated bytes: 1,056,768, sampled count: 1, call stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::slow_alloc_large<>()
__libc_malloc
AllocSetAlloc
MemoryContextAlloc
pq_init
BackendInitialize
ServerLoop
PostmasterMain
Failed to symbolize
main
__libc_start_main
================
I0604 05:51:47.637864 2113950 ybc_pggate.cc:782] estimated bytes: 1,056,768, estimated count: 129, sampled_allocated bytes: 8,192, sampled count: 1, call stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()
tcmalloc::tcmalloc_internal::slow_alloc_small<>()
__libc_malloc
AllocSetContextCreateInternal
CreateCacheMemoryContext
RelationCacheInitialize
InitPostgresImpl
InitPostgres
PostgresMain
Failed to symbolize
Failed to symbolize
PostmasterMain
Failed to symbolize
main
__libc_start_main
================
I0604 05:51:47.637867 2113950 ybc_pggate.cc:782] estimated bytes: 1,048,576, estimated count: 128, sampled_allocated bytes: 8,192, sampled count: 1, call stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()
tcmalloc::tcmalloc_internal::slow_alloc_small<>()
__libc_malloc
AllocSetAlloc
palloc
InitDeadLockChecking
ServerLoop
PostmasterMain
Failed to symbolize
main
__libc_start_main
================
I0604 05:51:47.637871 2113950 ybc_pggate.cc:782] estimated bytes: 1,048,576, estimated count: 128, sampled_allocated bytes: 8,192, sampled count: 1, call stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()
tcmalloc::tcmalloc_internal::slow_alloc_small<>()
__libc_malloc
AllocSetContextCreateInternal
hash_create
EnablePortalManager
InitPostgresImpl
InitPostgres
PostgresMain
Failed to symbolize
Failed to symbolize
PostmasterMain
Failed to symbolize
main
__libc_start_main
================
I0604 05:51:47.637873 2113950 ybc_pggate.cc:782] estimated bytes: 1,048,576, estimated count: 256, sampled_allocated bytes: 4,096, sampled count: 1, call stack:
tcmalloc::tcmalloc_internal::SampleifyAllocation<>()
tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()
tcmalloc::tcmalloc_internal::slow_alloc_small<>()
__libc_malloc
yb::SharedArena()
yb::pggate::PgDml::PgDml()
yb::pggate::PgDmlRead::PgDmlRead()
yb::pggate::PgSelect::PgSelect()
yb::pggate::PgSelect::Make()
yb::pggate::PgApiImpl::NewSelect()
YBCPgNewSelect
ybcBeginScan
ybc_systable_begin_default_scan
InitPostgresImpl
InitPostgres
PostgresMain
Failed to symbolize
Failed to symbolize
PostmasterMain
Failed to symbolize
main
__libc_start_main
================
```
Reviewers: sanketh
Reviewed By: sanketh
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D44500