Summary:
The following lock order inversion could happen:
CatalogManager::AlterTable acquired catalog manager lock, then tries to replicate altered table information, which requires raft replica lock.
MasterSnapshotCoordinator::CreateReplicated is invoked when replica lock is held by apply thread. Then it tries to get tablets info to schedule operations.
But it is necessary to acquire catalog manager lock to obtain tablets info.
This deadlock is auto resolved via timeout in alter table.
But for this period of time all heartbeats and other operations that require catalog manager lock are blocked.
Fixed by using separate thread pool to schedule tablet operations.
Jira: DB-15933
Test Plan: ./yb_build.sh fastdebug --gcc11 --cxx-test yb-admin-snapshot-schedule-test --gtest_filter YbAdminSnapshotScheduleTestWithYsqlColocationRestoreParam.PgsqlSequenceVerifyPartialRestore/DBColocated_Clone -n 40 -- -p 8
Reviewers: mhaddad
Reviewed By: mhaddad
Subscribers: ybase
Tags: #jenkins-ready
Differential Revision: https://phorge.dev.yugabyte.com/D43681