New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946
Comments
sanketkedia
added a commit
that referenced
this issue
Apr 1, 2022
…ations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Test Plan: Tested manually by bringing in data and wals from a 2.4.x that has an unflushed CREATE_ON_MASTER op to master and performed steps (2) and (3). Without the fix it crashes, with this fix it doesn't replay. Also, tested that we can bypass without crashing on the above data. Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16314
sanketkedia
added a commit
that referenced
this issue
Apr 3, 2022
…g snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: fe2b082 / D11946 Test Plan: Jenkins: rebase: 2.12 Jenkins: urgent Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16336
sanketkedia
added a commit
that referenced
this issue
Apr 3, 2022
… snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: fe2b082 / D11946 Test Plan: Jenkins: rebase: 2.8 Jenkins: urgent Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16333
sanketkedia
added a commit
that referenced
this issue
Apr 4, 2022
… snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: fe2b082 / D11946 Test Plan: Jenkins: rebase: 2.6 Jenkins: urgent Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16332
sanketkedia
added a commit
that referenced
this issue
Sep 26, 2022
…ng snapshot operations during tablet bootstrap Summary: Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case: - Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL - Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record - Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes. Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off. Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a duplicate snapshot request. Original commit: bcfa4e7 / D16333 Test Plan: Jenkins: rebase: 2.8.3 Jenkins: urgent Reviewers: sergei, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19770
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:
CREATE_ON_MASTER
to the WALCREATE_ON_MASTER
and add snapshot record to RocksDB. And flush RocksDB, butop.id
is not initialized properly so it does not containCREATE_ON_MASTER
recordCREATE_ON_MASTER
again, but it is already present in DB, so the master crashes.Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.
In the commit below I added a gflag
skip_crash_on_duplicate_snapshot
with a default value offalse
that can be changed in order to bypass crashing on a duplicate snapshot request.The text was updated successfully, but these errors were encountered: