[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946

sanketkedia · 2022-03-31T23:37:59Z

Description

Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

In the commit below I added a gflag skip_crash_on_duplicate_snapshot with a default value of false that can be changed in order to bypass crashing on a duplicate snapshot request.

The text was updated successfully, but these errors were encountered:

…ations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Test Plan: Tested manually by bringing in data and wals from a 2.4.x that has an unflushed CREATE_ON_MASTER op to master and performed steps (2) and (3). Without the fix it crashes, with this fix it doesn't replay. Also, tested that we can bypass without crashing on the above data. Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16314

…g snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: fe2b082 / D11946 Test Plan: Jenkins: rebase: 2.12 Jenkins: urgent Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16336

… snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: fe2b082 / D11946 Test Plan: Jenkins: rebase: 2.8 Jenkins: urgent Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16333

… snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: fe2b082 / D11946 Test Plan: Jenkins: rebase: 2.6 Jenkins: urgent Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16332

…ng snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: bcfa4e7 / D16333 Test Plan: Jenkins: rebase: 2.8.3 Jenkins: urgent Reviewers: sergei, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19770

sanketkedia added the area/docdb YugabyteDB core features label Mar 31, 2022

sanketkedia self-assigned this Mar 31, 2022

sanketkedia added this to To do in PITR via automation Mar 31, 2022

jasonriddell pinned this issue Apr 8, 2022

jasonriddell unpinned this issue Apr 8, 2022

sanketkedia closed this as completed Apr 12, 2022

PITR automation moved this from To do to Done Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946

[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946

sanketkedia commented Mar 31, 2022 •

edited

[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946

[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946

Comments

sanketkedia commented Mar 31, 2022 • edited

Description

sanketkedia commented Mar 31, 2022 •

edited