Skip to content

2.27.0.0-b91

@spolitov spolitov tagged this 11 May 06:15
Summary:
The following lock order inversion could happen:
CatalogManager::AlterTable acquired catalog manager lock, then tries to replicate altered table information, which requires raft replica lock.

MasterSnapshotCoordinator::CreateReplicated is invoked when replica lock is held by apply thread. Then it tries to get tablets info to schedule operations.
But it is necessary to acquire catalog manager lock to obtain tablets info.

This deadlock is auto resolved via timeout in alter table.
But for this period of time all heartbeats and other operations that require catalog manager lock are blocked.

Fixed by using separate thread pool to schedule tablet operations.
Jira: DB-15933

Test Plan: ./yb_build.sh fastdebug --gcc11 --cxx-test yb-admin-snapshot-schedule-test --gtest_filter YbAdminSnapshotScheduleTestWithYsqlColocationRestoreParam.PgsqlSequenceVerifyPartialRestore/DBColocated_Clone -n 40 -- -p 8

Reviewers: mhaddad

Reviewed By: mhaddad

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D43681
Assets 2
Loading