gapisback Core changes to support running Splinter with allocated shared memory.
42799b1 Aug 30, 2023
Core changes to support running Splinter with allocated shared memory.
Support to run SplinterDB with shared memory configured for most
memory allocation is an -EXPERIMENTAL- feature added with this commit.

This commit brings in basic support to create a shared memory segment and
to redirect all memory allocation primitives to shared memory. Currently,
we only support a simplistic memory mgmt; i.e. only-allocs, and a very
simplistic handling of free() of the very last memory piece allocated.
With shared segments of 1-2 GiB we can run all functional and unit tests.

The high-points of the changes are:

- External configuration: splinterdb_config{} gains a few new visible
  fields to configure and troubleshoot shared memory configuration.
   - Boolean: use_shmem: Default is OFF
   - size_t : shmem_size:

- The main driving change is the re-deployment of platform_heap_id 'hid'
  arg that appears in all memory-related interfaces. If Splinter is
  configured for shared memory use, 'hid' will be an opaque handle to
  the shared segment. Most memory allocation will be redirected to new
  shmem-based alloc() / free() interfaces.

- Formalize usages of PROCESS_PRIVATE_HEAP_ID: A small number of clients
  that wish to repeatedly allocate large chunks of memory tend to cause
  OOMs. The memory allocated by these clients is not shared across threads
  / processes. For such usages, introduce PROCESS_PRIVATE_HEAP_ID as an
  alias to NULL, defaulting to allocating memory from the heap.

- Manage handling of heap-ID to platform_get_heap_id() to correctly
  return the handle to shared memory. (Otherwise, it would return
  NULL by default.)

- BTree pack allocates large fingerprint-array. This also causes large
  tests to run into OOMs. For threaded execution, it's ok if the memory
  for this array is allocated from the heap. But for multi-process
  execution, when one process (thread) allocates this finger print
  array, another thread may pick up the task to compact a bundle and
  will try to free this memory.

  So, this memory has to come from shared memory. To cope with such
  repeated allocations of large chunks of memory to build fingerprint,
  a small scheme for recycling such "free"-large-memory chunks scheme
  is supported by shmem module.

  Applied this technique to recycle memory allocated for iterators also.
  They tend to be big'gish, so can also cause shmem-OOMs.

- All existing functional and unit-tests have been enhanced to now
  support "--use-shmem" argument. This will create Splinter with
  shared memory configured, and tests are run in this mode.

  This change brings-in quite a good coverage of existing testing for
  this new feature.

   - New test: large_inserts_bugs_stress_test -- added to cover the
     primary use-case of concurrent insert performance benchmarking
     (that this feature is driving in prior integration effort).

   - test.sh enhanced to run different classes of test with the
     "--use-shmem" option.

- Diagnostis & Troubleshooting:

   - Shmem-based alloc/free interfaces extended to print name of object
     and other call-site info, to better pinpoint source code-flow
     leading to memory issues.

   - Add shared memory usage metrics, including for large-fragment
     handling.  Report summary-line of metrics when Splinter is shutdown.
     Print stats on close.

   - Add various utility diagnostic helper methods to validate that
     addresses within shared memory are valid. Unit-tests and some asserts
     use these.

- minor #include cleanups

Changes arising through review cycle and stabilization v/s /main:

- In test.sh/run_slower_unit_tests(), re-enable execution of
  large_inserts_bugs_stress_test, but bracketted under "set +e" / "set -e"
  settings. If this test fails in CI (as it does randomly), hopefully,
  this SET toggling will allow the rest of the script to still run. CI job
  should not fail immediately.
  (Some deeper stabilization is needed for these test cases.)

- Purged the heap_handle * in shmem.h/.c module and through the rest
  of the Splinter code. Only heap-ID is a valid handle anymore.

- Fix race condition bug in platform_shm_alloc()

- Added Micro-optimization to recycle last-allocated frag being freed.

- Add config_parse_use_shmem() as parsing interface to see if
  "--use-shmem" was supplied. Apply to many unit-/functional-tests.

Rework unit-tests to use config_parse_use_shmem() to support --use-shmem parsing.

Re-enable large_inserts_bugs_stress_test execution.
42799b1