-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Upgrade to 2.20.3.0 version fails with inconsistency errors. #22057
Comments
|
Though there are rows retrieved with NULL clause
|
|
|
with count(*) 334 are obsolete keys and without there are no obsolete keys.
|
Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry Reviewed By: pjain, hsunder, dmitry Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34733
…ate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34856
…ate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34857
…pdate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34858
…e to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34865
…ate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34868
…ate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: smishra, hsunder, dmitry, timur, rthallam Reviewed By: rthallam Subscribers: aaruj, ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34916
All backports have landed. |
Background: In version 2.20.3.0, the row-locking feature was introduced to fix various issues arising from concurrent updates. While this fix was added under a gflag, the row locking feature is on by default in 2.20.3.0 build. If a cluster is being upgraded from a version lower than 2.20.0.0 , say, for example, 2.18.0, then during rolling upgrades, different nodes in the cluster could be running on different versions and that results in the row-locking feature being enabled on some nodes (upgraded nodes), and feature disabled on the remaining nodes (which are not upgraded yet), that continue to use the previous form of column level locking. The nodes on the old versions incorrectly handle the writes generated by the new version, resulting in corrupting table data. Fix: The solution is to convert the existing gflag - |
Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry Reviewed By: pjain, hsunder, dmitry Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34733
…ate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34857
…ate to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34856
…e to auto-flag Summary: Combination of two commits: decb104111 + 6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34865
…ck_for_update to auto-flag Summary: Combination of two commits: yugabyte@decb104111 + yugabyte@6de97906 introduced a compatibility issue between pre-decb104111 and post-6de97906 tserver versions if they are running at the same time during upgrade. With the post-6de97906 logic `WriteQuery::DoCompleteExecute` when isolation level is not `SERIALIZABLE_ISOLATION` it can add WRITE_OP that contains both `write_pairs` and `read_pairs` to the Raft log. Corresponding `read_pair` will have key set to encoded row key and value set to `KeyEntryTypeAsChar::kNullLow`. This WRITE_OP is then processed by `TransactionalWriter::Apply` and corresponding intents are generated and written to the intents DB. In post-6de97906 version `TransactionalWriter::Apply` processes such `read_pair` and as a result generates intent `<row> -> kNullLow` of type `[kStrongRead]`. But in pre-decb104111 version `TransactionalWriter::Apply` processes such `read_pair` in a different way and generates intent `<row> -> kNullLow` of type `[kStrongRead, kStrongWrite]`. Then `ApplyIntentsContext::Entry` processes intents and in pre-decb104111 version due to presence of `kStrongWrite` type this intent gets written into regular DB and results in the following record: `<row> -> kNullLow`, which is incorrect regular DB record. Effect of handing this record by DocDB is different depending on WHERE filter and presence of aggregation, it may result in either affected rows be not visible by the user statement or visible but with non-PK columns set to null. Given that `TransactionalWriter::Apply` only writes `read_pairs` for non-SERIALIZABLE_ISOLATION when new logic is enabled by `ysql_skip_row_lock_for_update`, the solution is to convert it to an auto-flag, so new logic is only enabled after all nodes are on the new version. Also added `PgsqlWriteOperation::use_row_lock_for_update_` which is initialized in constructor to avoid changing behaviour in context of the same `PgsqlWriteOperation` instance (since flag is now runtime and the auto-flag will change at runtime from false to true). Jira: DB-10979 Original commit: 987163c / D34733 Test Plan: Run TPCC workload in parallel with upgrade from 2.18.7.0-b38 to build with this fix incorporated. Reviewers: pjain, smishra, hsunder, dmitry, rthallam Reviewed By: pjain Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D34858
Jira Link: DB-10979
Description
TPCC workload running while cluster upgrade from 2.18.7.0-b38 to 2.21.1.0-b90 is failing.
16:06:35,834 (Worker.java:546) WARN - The DBMS rejected the transaction without an error code:ERROR: null value in column "w_tax" violates not-null constraint Detail: Failing row contains (999, 2308909.22, null, null, null, null, null, null, null).
Even post cluster upgrade the workload is failing cause of same error.
On manually validating the contents of the row were null/empty for the warehouse id which failed in the test
yugabyte=# select * from warehouse where w_id=999 LIMIT 1 yugabyte-# ; 999 | 2304875.01 | | | | | | |
Jenkins logs: http://10.9.9.254:54422/job/TPCC-benchmark/5847/console
Perf studio job: https://perf.dev.yugabyte.com/perfstudio-dashboard/status/5424302
Cluster: http://10.9.131.126/universes/54cea849-a569-4a4b-aae6-4f9ef2123162/nodes
To reproduce the issue perf studio job mentioned above can be retriggered.
The same perf studio job passed when the cluster was upgraded from 2.18.7.0-b38 to 2.18.7.0-b40
Perf studio job: https://perf.dev.yugabyte.com/perfstudio-dashboard/status/5430102
Perf report: Link
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: