Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Correctly set op id when replaying snapshot operations during tablet bootstrap #11946

Closed
sanketkedia opened this issue Mar 31, 2022 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features
Projects

Comments

@sanketkedia
Copy link
Contributor

sanketkedia commented Mar 31, 2022

Description

Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

  • Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
  • Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
  • Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

In the commit below I added a gflag skip_crash_on_duplicate_snapshot with a default value of false that can be changed in order to bypass crashing on a duplicate snapshot request.

@sanketkedia sanketkedia added the area/docdb YugabyteDB core features label Mar 31, 2022
@sanketkedia sanketkedia self-assigned this Mar 31, 2022
@sanketkedia sanketkedia added this to To do in PITR via automation Mar 31, 2022
sanketkedia added a commit that referenced this issue Apr 1, 2022
…ations during tablet bootstrap

Summary:
Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

- Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
- Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
- Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

Test Plan:
Tested manually by bringing in data and wals from a 2.4.x that has an unflushed CREATE_ON_MASTER
op to master and performed steps (2) and (3). Without the fix it crashes, with this fix it doesn't
replay.

Also, tested that we can bypass without crashing on the above data.

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16314
sanketkedia added a commit that referenced this issue Apr 3, 2022
…g snapshot operations during tablet bootstrap

Summary:
Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

- Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
- Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
- Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a
duplicate snapshot request.

Original commit: fe2b082 / D11946

Test Plan:
Jenkins: rebase: 2.12
Jenkins: urgent

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16336
sanketkedia added a commit that referenced this issue Apr 3, 2022
… snapshot operations during tablet bootstrap

Summary:
Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

- Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
- Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
- Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a
duplicate snapshot request.

Original commit: fe2b082 / D11946

Test Plan:
Jenkins: rebase: 2.8
Jenkins: urgent

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16333
sanketkedia added a commit that referenced this issue Apr 4, 2022
… snapshot operations during tablet bootstrap

Summary:
Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

- Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
- Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
- Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a
duplicate snapshot request.

Original commit: fe2b082 / D11946

Test Plan:
Jenkins: rebase: 2.6
Jenkins: urgent

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16332
@jasonriddell jasonriddell pinned this issue Apr 8, 2022
@jasonriddell jasonriddell unpinned this issue Apr 8, 2022
PITR automation moved this from To do to Done Apr 12, 2022
sanketkedia added a commit that referenced this issue Sep 26, 2022
…ng snapshot operations during tablet bootstrap

Summary:
Currently, we don't set op id when replaying snapshot operations during tablet bootstrap. However, we flush the snapshot entry into the sys catalog immediately after the create succeeds and also update the frontier. This can cause a duplicate replay of the same snapshot op with the same snapshot id. On the master side, we treat this as FATAL and crash. In particular, consider the following case:

- Cluster is running pre 2.6, where we flush rarely. It adds CREATE_ON_MASTER to the WAL
- Cluster is updated to 2.6+, during local bootstrap we replay this CREATE_ON_MASTER and add snapshot record to RocksDB. And flush RocksDB, but op.id is not initialized properly so it does not contain CREATE_ON_MASTER record
- Cluster is restarted, it tries to replay CREATE_ON_MASTER again, but it is already present in DB, so the master crashes.

Separately, we should also add some logic to bypass the duplicate snapshot request and not crash. This would be governed by a GFlag. By default, the behavior would be to crash but if someone runs into this issue then they can turn it on and complete the bootstrap and then again turn it off.

Added a gflag `skip_crash_on_duplicate_snapshot` with a default value of `false` that can be changed in order to bypass crashing on a
duplicate snapshot request.

Original commit: bcfa4e7 / D16333

Test Plan:
Jenkins: rebase: 2.8.3
Jenkins: urgent

Reviewers: sergei, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D19770
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Projects
PITR
Done
Development

No branches or pull requests

1 participant