Summary:
The map `session_virtual_wal_` maintains mapping from session ID to a pointer to the Virtual WAL instance. Each tserver maintains its own map, and an entry is added to the map whenever a new Virtual WAL is initialised. Now, there are two deleters from this map:
# `DestroyVirtualWALForCDC()` : This comes into action when a walsender exits cleanly.
# `DestroyVirtualWALBatchForCDC()` : This comes into action when a walsender crashes. This is called by `CleanupSessions()`, which is a bg thread to remove expired sessions. Here we first identify which of the expired sessions have a corresponding VWAL and then we destroy the Virtual WAL and delete the entry from `session_virtual_wal_`.
In the second deleter's code path, we acquire a read lock on `session_virtual_wal_` when we are filtering the expired sessions to find out the ones with a corresponding Virtual WAL. We then relinquish this read lock and acquire a write lock at the time of performing the destroy Virtual WAL and deletion from the map. There exists a period in between the two steps where we release the lock and the first deletion path mentioned above ( i.e `DestroyVirtualWALForCDC()`) may come in and delete the entry from `session_virtual_wal_`. Now when deleter 2 tries to access this entry for destroying Virtual WAL we get a segmentation fault.
To fix this issue, we are getting rid of the filtering logic. Instead `DestroyVirtualWALBatchForCDC()` will be directly called from CleanupSessions() and in the critical session protected by the lock, we will perform necessary deletion if the expired session exists in `session_virtual_wal_`.
Jira: DB-18021
Test Plan:
QA tests
Adding a UT to repro this test is not possible right now, since we do not have test infra to run walsender in c++ tests. Whereas the java tests do not have the syncpoint infra to deterministically reproduce the race. Ticket for adding the UT: https://yugabyte.atlassian.net/browse/DB-18034
Reviewers: skumar, asrinivasan
Reviewed By: asrinivasan
Subscribers: yql, ycdcxcluster
Tags: #jenkins-ready
Differential Revision: https://phorge.dev.yugabyte.com/D46229