Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Ignore statuses from old status tablet on successful transaction promotion #19535

Closed
1 task done
basavaraj29 opened this issue Oct 16, 2023 · 1 comment
Closed
1 task done
Assignees
Labels
2.18 Backport Required 2.20 Backport Required area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@basavaraj29
Copy link
Contributor

basavaraj29 commented Oct 16, 2023

Jira Link: DB-8325

Description

For transaction promotion, once we pick a new status tablet, we send UpdateStatusTabletlocation rpc(s) to all involved tablets up until then (we also hold off sending the request until the tablet has processed an actual write), and start a new heartbeater thread pinging the new status tablet.

When the participant switches the status tablet for a RunningTransaction, we should ignore responses for the old status requests. Else, the participant could wrongly assume that the transactions is aborted (since the promotion codepath sends an abort to the old status tablet at some point), and cleans up the intents. And the participant seems to not complain during commit time for some other reasons necessary for correctness, which leads to data inconsistency in this case. Hence, we should ignore processing status responses from the old status tablet.

Another issue is that, we have transaction heartbeat thread at the query layer sending/processing heartbeat requests/responses. And for promoted transaction, there could be 2 threads processing heartbeats against different status tablets for a period. Once an abort is sent to the old status tablet, we stop sending heartbeat requests to it as well. But we should also not process responses from a request that was sent before sending the abort, but for which the resp arrived later. Else, we could wrongly assume that the transaction is aborted (while it was successfully promoted), and send clean request to participants and return an error code. Hence, we should stop processing heartbeat responses once we send an abort to the old status tablet.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@basavaraj29 basavaraj29 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Oct 16, 2023
@basavaraj29 basavaraj29 self-assigned this Oct 16, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Oct 16, 2023
basavaraj29 added a commit that referenced this issue Oct 19, 2023
…involved in write, Ignore statuses from old status tablet on promotion

Summary:
The diff fixes a couple of issues with transaction promotion.

1. `tablets_` maintained at `YBTransaction::Impl` seems to maintain a list of tablets that are considered as "involved" for a transaction (the transaction has successfully processed some sort of write, which is determined by involved `YBOperation`'s `op` - (`op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)`). And the transaction promotion codepath ideally doesn't send promotion requests to tablets that haven't processed a write from this transaction yet.

In the current implementation, we update `num_completed_batches` of an entry of `tablets_` on all successfully completed operations immaterial of whether it processed a write at the tablet. This seems to wrongly affect the transaction promotion codepath, where promotion requests are sent to all tablets in `transaction_status_move_tablets_` (which is a consistent copy of `tablets_`) and with `num_completed_batches > 0`. When a txn participant sees a promotion request for an unknown transactions, it returns a 40001. So in a highly conflicting workload with read committed isolation, this seems to be frequent and we error with 40001. We would also hit this error when a promotion request is sent to a tablet that processes a vanilla read (without the `row_mark_type` set) with the existing code.

This diff addresses the issue by updating `num_completed_batches` only when `op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)` is true i.e, the op successfully wrote something to the tablet.

2. In the existing implementation, when processing the received status of a `RunningTransaction` at the participant, we don't really track if the response was from `old_status_tablet` or current `status_tablet`. For transaction promotion, this leads to additional issues. Consider a transaction that starts off as a local txn, successfully completes writes at a tablet T, and then undergoes promotion. In the process, an abort is sent to the old status tablet. Now if the `RunningTransaction` sees this abort, it initiates a cleanup at the participant. And when the promoted transaction now commits, we don't seem to error out and the updates get undone leading to data inconsistency.

This diff addresses the issue by tracking the status tablet on which the status request was initiated. On receiving the status response, if we see that the transaction underwent promotion, we return an already existing old status for the transaction instead of using the newly arrived status. Since the status tablet is now updated, subsequent status requests will get the latest state. Additionally, we reject promotion if there's an active request to abort the transaction.

3. At the query layer (`client/tramsaction.cc`), we could have two active heartbeating threads for a promoted transaction. in the existing implementation, we stop sending requests to the old status tablet once we have sent an abort to it (`state != OldTransactionState::kNone && state != OldTransactionState::kRunning`). But if we receive a response from an already sent request, we seem do the error handling and proactive clean up (requests to involved txn participants) even when if the abort to old status tablet was sent (`state == OldTransactionState::kAborting || state == OldTransactionState::kAborted`). This could lead to unnecessary failures for subsequent operations of the transaction.

This diff address the issue by dropping the status response if an abort to the old status tablet was already sent.

Additional context: Holding off promotion requests until the tablet has seen a write was introduced in [[ 21c5918 | commit ]]. Earlier, transaction promotion was being retried on failures which was removed in [[ 1ae14b8 | commit ]] (so we don't retry failed promotion anymore).

Test Plan:
Jenkins
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestPromotionAmidstConflicts -n 20
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

The first test fails consistently without the changes. And it reproduces all the 3 issues in the description.

Elaborating on what the test does - We have a couple of transactions starting off as local transactions by issuing a `PGSQL_WRITE` on a tablet (say T_P1) in partition P1. This is followed by few more `PGSQL_READ` operations are launched for all tablets (2 tablets in P1, 1 tablet in P2, and 1 tablet in P3). These read ops trigger a transaction promotion request. Depending on the order in which these ops get flushed, if the read to T_P1 and T_P2 get flushed before sending promotion requests, the existing code seems to insert these tablets into the list of "involved tablets". Note that these read ops don't have `row_mark_type` set. And the promotion codepath errors out returning a 40001 to the backend.

With the changes in the diff, since we now only update `num_completed_batches` when `op.yb_op->applied() && op.yb_op >should_apply_intents(metadata_.isolation)`, we shouldn't run into this issue.

put up another simpler test to validate point 2. in the description that fails consistently without the current changes in place.
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

Reviewers: sergei, rsami, rthallam, esheng

Reviewed By: sergei, rsami

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D29252
basavaraj29 added a commit that referenced this issue Oct 23, 2023
…blets that are involved in write, Ignore statuses from old status tablet on promotion

Summary:
Original commit: 29bcfd1 / D29252
The diff fixes a couple of issues with transaction promotion.

1. `tablets_` maintained at `YBTransaction::Impl` seems to maintain a list of tablets that are considered as "involved" for a transaction (the transaction has successfully processed some sort of write, which is determined by involved `YBOperation`'s `op` - (`op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)`). And the transaction promotion codepath ideally doesn't send promotion requests to tablets that haven't processed a write from this transaction yet.

In the current implementation, we update `num_completed_batches` of an entry of `tablets_` on all successfully completed operations immaterial of whether it processed a write at the tablet. This seems to wrongly affect the transaction promotion codepath, where promotion requests are sent to all tablets in `transaction_status_move_tablets_` (which is a consistent copy of `tablets_`) and with `num_completed_batches > 0`. When a txn participant sees a promotion request for an unknown transactions, it returns a 40001. So in a highly conflicting workload with read committed isolation, this seems to be frequent and we error with 40001. We would also hit this error when a promotion request is sent to a tablet that processes a vanilla read (without the `row_mark_type` set) with the existing code.

This diff addresses the issue by updating `num_completed_batches` only when `op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)` is true i.e, the op successfully wrote something to the tablet.

2. In the existing implementation, when processing the received status of a `RunningTransaction` at the participant, we don't really track if the response was from `old_status_tablet` or current `status_tablet`. For transaction promotion, this leads to additional issues. Consider a transaction that starts off as a local txn, successfully completes writes at a tablet T, and then undergoes promotion. In the process, an abort is sent to the old status tablet. Now if the `RunningTransaction` sees this abort, it initiates a cleanup at the participant. And when the promoted transaction now commits, we don't seem to error out and the updates get undone leading to data inconsistency.

This diff addresses the issue by tracking the status tablet on which the status request was initiated. On receiving the status response, if we see that the transaction underwent promotion, we return an already existing old status for the transaction instead of using the newly arrived status. Since the status tablet is now updated, subsequent status requests will get the latest state. Additionally, we reject promotion if there's an active request to abort the transaction.

3. At the query layer (`client/tramsaction.cc`), we could have two active heartbeating threads for a promoted transaction. in the existing implementation, we stop sending requests to the old status tablet once we have sent an abort to it (`state != OldTransactionState::kNone && state != OldTransactionState::kRunning`). But if we receive a response from an already sent request, we seem do the error handling and proactive clean up (requests to involved txn participants) even when if the abort to old status tablet was sent (`state == OldTransactionState::kAborting || state == OldTransactionState::kAborted`). This could lead to unnecessary failures for subsequent operations of the transaction.

This diff address the issue by dropping the status response if an abort to the old status tablet was already sent.

Additional context: Holding off promotion requests until the tablet has seen a write was introduced in [[ 21c5918 | commit ]]. Earlier, transaction promotion was being retried on failures which was removed in [[ 1ae14b8 | commit ]] (so we don't retry failed promotion anymore).

Test Plan:
Jenkins
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestPromotionAmidstConflicts -n 20
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

The first test fails consistently without the changes. And it reproduces all the 3 issues in the description.

Elaborating on what the test does - We have a couple of transactions starting off as local transactions by issuing a `PGSQL_WRITE` on a tablet (say T_P1) in partition P1. This is followed by few more `PGSQL_READ` operations are launched for all tablets (2 tablets in P1, 1 tablet in P2, and 1 tablet in P3). These read ops trigger a transaction promotion request. Depending on the order in which these ops get flushed, if the read to T_P1 and T_P2 get flushed before sending promotion requests, the existing code seems to insert these tablets into the list of "involved tablets". Note that these read ops don't have `row_mark_type` set. And the promotion codepath errors out returning a 40001 to the backend.

With the changes in the diff, since we now only update `num_completed_batches` when `op.yb_op->applied() && op.yb_op >should_apply_intents(metadata_.isolation)`, we shouldn't run into this issue.

put up another simpler test to validate point 2. in the description that fails consistently without the current changes in place.
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

Reviewers: sergei, rsami, rthallam, esheng

Reviewed By: rthallam

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29544
basavaraj29 added a commit that referenced this issue Oct 28, 2023
…blets that are involved in write, Ignore statuses from old status tablet on promotion

Summary:
Original commit: 29bcfd1 / D29252
The diff fixes a couple of issues with transaction promotion.

1. `tablets_` maintained at `YBTransaction::Impl` seems to maintain a list of tablets that are considered as "involved" for a transaction (the transaction has successfully processed some sort of write, which is determined by involved `YBOperation`'s `op` - (`op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)`). And the transaction promotion codepath ideally doesn't send promotion requests to tablets that haven't processed a write from this transaction yet.

In the current implementation, we update `num_completed_batches` of an entry of `tablets_` on all successfully completed operations immaterial of whether it processed a write at the tablet. This seems to wrongly affect the transaction promotion codepath, where promotion requests are sent to all tablets in `transaction_status_move_tablets_` (which is a consistent copy of `tablets_`) and with `num_completed_batches > 0`. When a txn participant sees a promotion request for an unknown transactions, it returns a 40001. So in a highly conflicting workload with read committed isolation, this seems to be frequent and we error with 40001. We would also hit this error when a promotion request is sent to a tablet that processes a vanilla read (without the `row_mark_type` set) with the existing code.

This diff addresses the issue by updating `num_completed_batches` only when `op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)` is true i.e, the op successfully wrote something to the tablet.

2. In the existing implementation, when processing the received status of a `RunningTransaction` at the participant, we don't really track if the response was from `old_status_tablet` or current `status_tablet`. For transaction promotion, this leads to additional issues. Consider a transaction that starts off as a local txn, successfully completes writes at a tablet T, and then undergoes promotion. In the process, an abort is sent to the old status tablet. Now if the `RunningTransaction` sees this abort, it initiates a cleanup at the participant. And when the promoted transaction now commits, we don't seem to error out and the updates get undone leading to data inconsistency.

This diff addresses the issue by tracking the status tablet on which the status request was initiated. On receiving the status response, if we see that the transaction underwent promotion, we return an already existing old status for the transaction instead of using the newly arrived status. Since the status tablet is now updated, subsequent status requests will get the latest state. Additionally, we reject promotion if there's an active request to abort the transaction.

3. At the query layer (`client/tramsaction.cc`), we could have two active heartbeating threads for a promoted transaction. in the existing implementation, we stop sending requests to the old status tablet once we have sent an abort to it (`state != OldTransactionState::kNone && state != OldTransactionState::kRunning`). But if we receive a response from an already sent request, we seem do the error handling and proactive clean up (requests to involved txn participants) even when if the abort to old status tablet was sent (`state == OldTransactionState::kAborting || state == OldTransactionState::kAborted`). This could lead to unnecessary failures for subsequent operations of the transaction.

This diff address the issue by dropping the status response if an abort to the old status tablet was already sent.

Additional context: Holding off promotion requests until the tablet has seen a write was introduced in [[ 21c5918 | commit ]]. Earlier, transaction promotion was being retried on failures which was removed in [[ 1ae14b8 | commit ]] (so we don't retry failed promotion anymore).

Test Plan:
Jenkins
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestPromotionAmidstConflicts -n 20
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

The first test fails consistently without the changes. And it reproduces all the 3 issues in the description.

Elaborating on what the test does - We have a couple of transactions starting off as local transactions by issuing a `PGSQL_WRITE` on a tablet (say T_P1) in partition P1. This is followed by few more `PGSQL_READ` operations are launched for all tablets (2 tablets in P1, 1 tablet in P2, and 1 tablet in P3). These read ops trigger a transaction promotion request. Depending on the order in which these ops get flushed, if the read to T_P1 and T_P2 get flushed before sending promotion requests, the existing code seems to insert these tablets into the list of "involved tablets". Note that these read ops don't have `row_mark_type` set. And the promotion codepath errors out returning a 40001 to the backend.

With the changes in the diff, since we now only update `num_completed_batches` when `op.yb_op->applied() && op.yb_op >should_apply_intents(metadata_.isolation)`, we shouldn't run into this issue.

put up another simpler test to validate point 2. in the description that fails consistently without the current changes in place.
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

Reviewers: sergei, rsami, rthallam, esheng

Reviewed By: esheng

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29626
@basavaraj29
Copy link
Contributor Author

Note: this is a correctness fix required to prevent data inconsistency issue as described above, and has been backported up until 2.18 only. Ideally, we shouldn't be running Geo-partitioning workloads prior to 2.18 since they could be prone to data inconsistency issue described above.

basavaraj29 added a commit that referenced this issue Oct 31, 2023
…tablets that are involved in write, Ignore statuses from old status tablet on promotion

Summary:
Original commit: 0880949 / D29626
The diff fixes a couple of issues with transaction promotion.

1. `tablets_` maintained at `YBTransaction::Impl` seems to maintain a list of tablets that are considered as "involved" for a transaction (the transaction has successfully processed some sort of write, which is determined by involved `YBOperation`'s `op` - (`op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)`). And the transaction promotion codepath ideally doesn't send promotion requests to tablets that haven't processed a write from this transaction yet.

In the current implementation, we update `num_completed_batches` of an entry of `tablets_` on all successfully completed operations immaterial of whether it processed a write at the tablet. This seems to wrongly affect the transaction promotion codepath, where promotion requests are sent to all tablets in `transaction_status_move_tablets_` (which is a consistent copy of `tablets_`) and with `num_completed_batches > 0`. When a txn participant sees a promotion request for an unknown transactions, it returns a 40001. So in a highly conflicting workload with read committed isolation, this seems to be frequent and we error with 40001. We would also hit this error when a promotion request is sent to a tablet that processes a vanilla read (without the `row_mark_type` set) with the existing code.

This diff addresses the issue by updating `num_completed_batches` only when `op.yb_op->applied() && op.yb_op->should_apply_intents(metadata_.isolation)` is true i.e, the op successfully wrote something to the tablet.

2. In the existing implementation, when processing the received status of a `RunningTransaction` at the participant, we don't really track if the response was from `old_status_tablet` or current `status_tablet`. For transaction promotion, this leads to additional issues. Consider a transaction that starts off as a local txn, successfully completes writes at a tablet T, and then undergoes promotion. In the process, an abort is sent to the old status tablet. Now if the `RunningTransaction` sees this abort, it initiates a cleanup at the participant. And when the promoted transaction now commits, we don't seem to error out and the updates get undone leading to data inconsistency.

This diff addresses the issue by tracking the status tablet on which the status request was initiated. On receiving the status response, if we see that the transaction underwent promotion, we return an already existing old status for the transaction instead of using the newly arrived status. Since the status tablet is now updated, subsequent status requests will get the latest state. Additionally, we reject promotion if there's an active request to abort the transaction.

3. At the query layer (`client/tramsaction.cc`), we could have two active heartbeating threads for a promoted transaction. in the existing implementation, we stop sending requests to the old status tablet once we have sent an abort to it (`state != OldTransactionState::kNone && state != OldTransactionState::kRunning`). But if we receive a response from an already sent request, we seem do the error handling and proactive clean up (requests to involved txn participants) even when if the abort to old status tablet was sent (`state == OldTransactionState::kAborting || state == OldTransactionState::kAborted`). This could lead to unnecessary failures for subsequent operations of the transaction.

This diff address the issue by dropping the status response if an abort to the old status tablet was already sent.

Additional context: Holding off promotion requests until the tablet has seen a write was introduced in [[ 21c5918 | commit ]]. Earlier, transaction promotion was being retried on failures which was removed in [[ 1ae14b8 | commit ]] (so we don't retry failed promotion anymore).

Test Plan:
Jenkins
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestPromotionAmidstConflicts -n 20
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

The first test fails consistently without the changes. And it reproduces all the 3 issues in the description.

Elaborating on what the test does - We have a couple of transactions starting off as local transactions by issuing a `PGSQL_WRITE` on a tablet (say T_P1) in partition P1. This is followed by few more `PGSQL_READ` operations are launched for all tablets (2 tablets in P1, 1 tablet in P2, and 1 tablet in P3). These read ops trigger a transaction promotion request. Depending on the order in which these ops get flushed, if the read to T_P1 and T_P2 get flushed before sending promotion requests, the existing code seems to insert these tablets into the list of "involved tablets". Note that these read ops don't have `row_mark_type` set. And the promotion codepath errors out returning a 40001 to the backend.

With the changes in the diff, since we now only update `num_completed_batches` when `op.yb_op->applied() && op.yb_op >should_apply_intents(metadata_.isolation)`, we shouldn't run into this issue.

put up another simpler test to validate point 2. in the description that fails consistently without the current changes in place.
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoPartitionedReadCommiittedTest.TestParticipantIgnoresAbortFromOldStatusTablet

Reviewers: sergei, rsami, rthallam, esheng

Reviewed By: esheng

Subscribers: yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29787
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.18 Backport Required 2.20 Backport Required area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants