Skip to content

2.25.1.0-b242

@spolitov spolitov tagged this 26 Jan 05:41
Summary:
The test uncovered the following kind of deadlock:
T1:
1) lock table for read
2) shared lock on catalog manager

T2 (create table):
1) unique lock on catalog manager
2) lock table (indexed) for write and then commit

Step 1 happened for both threads, so T1 tries to acquire shared lock on catalog manager, but cannot do it since T2 holds unique lock on it. While T2 tries to acquire commit lock on table, but fails since readers never become 0.

Fixed this deadlock.

Also added conflict detection for cases where we try to acquire the catalog manager mutex while an object read lock is already held (which can result in deadlocks; see below).
When acquiring a ReadLock on an RwcLock object (e.g. TableInfo / TabletInfo), we now increment a thread-local counter which is checked when we try to shared lock the catalog manager mutex.
This detection could be enabled in rwc_lock.h using RWC_LOCK_TRACK_EXTERNAL_DEADLOCK define.

So fixed all other places found by our tests.
Jira: DB-15019

Test Plan: CppCassandraDriverTest.TestBackfillBatchingEnabled

Reviewers: zdrudi, asrivastava, xCluster, hsunder

Reviewed By: asrivastava, hsunder

Subscribers: esheng, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D41399
Assets 2
Loading