Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] PITR: Restore should ignore deleted namespaces #17887

Closed
1 task done
sanketkedia opened this issue Jun 21, 2023 · 0 comments
Closed
1 task done

[DocDB] PITR: Restore should ignore deleted namespaces #17887

sanketkedia opened this issue Jun 21, 2023 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects

Comments

@sanketkedia
Copy link
Contributor

sanketkedia commented Jun 21, 2023

Jira Link: DB-6967

Description

Observed this recently in a testing of xCluster with PITR. Consider the following scenario:

  1. Create database - demo
  2. Drop database created in (1)
  3. Create database - demo again
  4. Create snapshot schedule on demo via YBA UI
  5. Restore this schedule to any time -- get a recvmsg error with master crashes

The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus SCHECK fails since it expects only one database.

On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@sanketkedia sanketkedia added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jun 21, 2023
@sanketkedia sanketkedia self-assigned this Jun 21, 2023
@sanketkedia sanketkedia added this to To do in PITR via automation Jun 21, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 21, 2023
sanketkedia added a commit that referenced this issue Jun 27, 2023
Summary:
Observed this recently in a testing of xCluster with PITR. Consider the following scenario:

- Create database - demo
- Drop database created in (1)
- Create database - demo again
- Create snapshot schedule on demo via YBA UI
- Restore this schedule to any time -- get a recvmsg error with master crashes

The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a
schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is
to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus
SCHECK fails since it expects only one database.

On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus
the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database.

One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff.
Jira: DB-6967

Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace

Reviewers: zdrudi, mhaddad, slingam

Reviewed By: zdrudi, slingam

Subscribers: slingam, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26345
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jun 27, 2023
sanketkedia added a commit that referenced this issue Jun 27, 2023
…spaces

Summary:
Original commit: 52480dd / D26345
Observed this recently in a testing of xCluster with PITR. Consider the following scenario:

- Create database - demo
- Drop database created in (1)
- Create database - demo again
- Create snapshot schedule on demo via YBA UI
- Restore this schedule to any time -- get a recvmsg error with master crashes

The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a
schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is
to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus
SCHECK fails since it expects only one database.

On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus
the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database.

One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff.
Jira: DB-6967

Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace

Reviewers: zdrudi, mhaddad, slingam

Reviewed By: zdrudi

Subscribers: bogdan, ybase, slingam

Differential Revision: https://phorge.dev.yugabyte.com/D26498
sanketkedia added a commit that referenced this issue Jun 27, 2023
…mespaces

Summary:
Original commit: 52480dd / D26345
Observed this recently in a testing of xCluster with PITR. Consider the following scenario:

- Create database - demo
- Drop database created in (1)
- Create database - demo again
- Create snapshot schedule on demo via YBA UI
- Restore this schedule to any time -- get a recvmsg error with master crashes

The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a
schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is
to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus
SCHECK fails since it expects only one database.

On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus
the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database.

One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff.
Jira: DB-6967

Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace

Reviewers: zdrudi, mhaddad, slingam

Reviewed By: zdrudi

Subscribers: slingam, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26499
PITR automation moved this from To do to Done Jun 29, 2023
dr0pdb pushed a commit to dr0pdb/yugabyte-db that referenced this issue Jul 6, 2023
Summary:
Observed this recently in a testing of xCluster with PITR. Consider the following scenario:

- Create database - demo
- Drop database created in (1)
- Create database - demo again
- Create snapshot schedule on demo via YBA UI
- Restore this schedule to any time -- get a recvmsg error with master crashes

The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a
schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is
to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus
SCHECK fails since it expects only one database.

On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus
the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database.

One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff.
Jira: DB-6967

Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace

Reviewers: zdrudi, mhaddad, slingam

Reviewed By: zdrudi, slingam

Subscribers: slingam, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
PITR
Done
Development

No branches or pull requests

2 participants