Skip to content

2.27.0.0-b295

@spolitov spolitov tagged this 02 Jul 19:43
Summary:
While handling tablet server heartbeat we call `PopulatePgCatalogVersionInfo`, which will execute doc db read.
And scoped leader shared lock is held by tablet server heartbeat handler.

DocDB read could find intent from transaction and will request this transaction status.
It results in requesting tablets of "transactions" table.
This request is processed by the same master process, the will also try to acquire scoped leader shared lock.

If master leader change back and forth and this time, it results in sys catalog reload.
Which will try to acquire leader lock in exclusive mode.

So we get the following deadlock.
TSHeartbeat holds shared lock and it waits for read to complete.
Sys catalog reloaded and it tries to acquire lock in exclusive mode, but cannot proceed since lock already acquired in shared mode.
Read waits for transaction status resolution, which also tries to acquire lock in shared mode. But it cannot proceed since exclusive mode lock already requested.

This diff moves call to `PopulatePgCatalogVersionInfo` out of shared leader lock scope resolving this deadlock.
Jira: DB-17374

Test Plan: Jenkins

Reviewers: hsunder, #db-approvers

Reviewed By: hsunder, #db-approvers

Subscribers: slingam, svc_phabricator, patnaik.balivada, zdrudi, myang, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D45088
Assets 2
Loading