Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dst] Prefer push-based mechanism for signaling waiting transactions #13578

Closed
robertsami opened this issue Aug 11, 2022 · 0 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@robertsami
Copy link
Contributor

robertsami commented Aug 11, 2022

Jira Link: DB-3157
Any waiting transaction blocked in the WaitQueue will be unblocked as soon as the WaitQueue detects that it's blockers have been resolved. We currently rely on a pull-based mechanism where we poll the TransactionStatusManager on a regular basis. The TransactionStatusManager may (and usually does) trigger an RPC to the status tablet to check this transaction's status.

While we may want to maintain this poll-based mechanism as a safeguard, we should rely primarily on a push-based mechanism, where as soon as a local RunningTransaction instance is signaled of a new transaction status, the relevant WaitQueues are also signaled. Then we could also bump up the polling interval of the WaitQueue's pull-based resolution mechanism.

See: https://github.com/yugabyte/yugabyte-db/blob/master/src/yb/docdb/wait_queue.cc#L228

Note: we should be careful about which thread we use to resume conflict resolution for any unblocked waiters in this case, and ensure we're not deferring high-priority work in favor of re-running conflict resolution for waiting transactions. See also: #13580

@robertsami robertsami added the area/docdb YugabyteDB core features label Aug 11, 2022
@robertsami robertsami self-assigned this Aug 11, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 11, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Sep 3, 2022
@robertsami robertsami added this to To do in Wait-Queue Based Locking via automation Dec 8, 2022
@robertsami robertsami moved this from To do to GA Blocking in Wait-Queue Based Locking Jan 17, 2023
@robertsami robertsami moved this from GA Blocking to In progress in Wait-Queue Based Locking Mar 1, 2023
robertsami added a commit that referenced this issue Mar 16, 2023
…t to wait queue

Summary:
In case a transaction is committed, the transaction coordinator will send an UpdateTransaction request to each participating tablet. When the transaction participant processes this RPC, we can signal to the wait queue that such transaction was committed in case the wait queue is managing any waiting transactions blocked on this one. This should be more performant than relying on periodic call of WaitQueue::Poll to detect that a blocker is committed and unblock its waiters.

In case a transaction is aborted, the query layer client will send an UpdateTransaction request with status IMMEDIATE_CLEANUP to every involved transaction participant. In such a case we can similarly signal to the wait queue that this transaction was aborted.

In order to ensure a re-run of conflict resolution sees the latest signaled changes, we also modify the contract between conflict resolution and wait queue code to allow the wait queue to advance the resolution_ht used by conflict resolution beyond the ht of the commit/abort which triggered the waiter to be re-run.

Given these changes, we can achieve high fairness in most normal workloads. For sufficiently contentious workloads, we need to set `wait_queue_poll_interval_ms` to a fairly small setting to maintain fairness. Immediate follow-up work will reduce this dependency on `wait_queue_poll_interval_ms`: see #16440

This commit includes another change to refactor the wait queue to be owned by the transaction participant to resolve some lifetime issues with this signaling approach.

Test Plan: Jenkins: hot

Reviewers: bkolagani, sergei

Reviewed By: sergei

Subscribers: pjain, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D23614
Wait-Queue Based Locking automation moved this from In progress to Done Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

3 participants