New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] PITR: Restore should ignore deleted namespaces #17887
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Projects
Comments
sanketkedia
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Jun 21, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Jun 21, 2023
sanketkedia
added a commit
that referenced
this issue
Jun 27, 2023
Summary: Observed this recently in a testing of xCluster with PITR. Consider the following scenario: - Create database - demo - Drop database created in (1) - Create database - demo again - Create snapshot schedule on demo via YBA UI - Restore this schedule to any time -- get a recvmsg error with master crashes The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus SCHECK fails since it expects only one database. On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database. One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff. Jira: DB-6967 Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace Reviewers: zdrudi, mhaddad, slingam Reviewed By: zdrudi, slingam Subscribers: slingam, ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D26345
sanketkedia
added a commit
that referenced
this issue
Jun 27, 2023
…spaces Summary: Original commit: 52480dd / D26345 Observed this recently in a testing of xCluster with PITR. Consider the following scenario: - Create database - demo - Drop database created in (1) - Create database - demo again - Create snapshot schedule on demo via YBA UI - Restore this schedule to any time -- get a recvmsg error with master crashes The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus SCHECK fails since it expects only one database. On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database. One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff. Jira: DB-6967 Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace Reviewers: zdrudi, mhaddad, slingam Reviewed By: zdrudi Subscribers: bogdan, ybase, slingam Differential Revision: https://phorge.dev.yugabyte.com/D26498
sanketkedia
added a commit
that referenced
this issue
Jun 27, 2023
…mespaces Summary: Original commit: 52480dd / D26345 Observed this recently in a testing of xCluster with PITR. Consider the following scenario: - Create database - demo - Drop database created in (1) - Create database - demo again - Create snapshot schedule on demo via YBA UI - Restore this schedule to any time -- get a recvmsg error with master crashes The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus SCHECK fails since it expects only one database. On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database. One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff. Jira: DB-6967 Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace Reviewers: zdrudi, mhaddad, slingam Reviewed By: zdrudi Subscribers: slingam, ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D26499
dr0pdb
pushed a commit
to dr0pdb/yugabyte-db
that referenced
this issue
Jul 6, 2023
Summary: Observed this recently in a testing of xCluster with PITR. Consider the following scenario: - Create database - demo - Drop database created in (1) - Create database - demo again - Create snapshot schedule on demo via YBA UI - Restore this schedule to any time -- get a recvmsg error with master crashes The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus SCHECK fails since it expects only one database. On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database. One potential fix for this is to let master modify the snapshot schedule request to also persist the namespace id together with other details to ensure uniqueness. Existing snapshot schedules will also need to be patched so we need migration for that and that will be tackled in a separate diff. Jira: DB-6967 Test Plan: ybd --cxx_test snapshot-schedule-test --gtest-filter SnapshotScheduleTest.DeletedNamespace Reviewers: zdrudi, mhaddad, slingam Reviewed By: zdrudi, slingam Subscribers: slingam, ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D26345
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Jira Link: DB-6967
Description
Observed this recently in a testing of xCluster with PITR. Consider the following scenario:
The reason is that our filter for the snapshot schedule (a way of specifying what objects are covered by the schedule) is database name based i.e. the user says something like create a schedule on ysql.demo with x interval and y retention as opposed to create a schedule on database id a with x interval and y retention. Problem is YBA passes this user entered filter as it is to the master so on the master it gets recorded with just the db name. Thus, when we try to restore in (5), we get two namespaces that match the filter (one DELETED and one RUNNING) and thus SCHECK fails since it expects only one database.
On the other hand, if you create the schedule via yb-admin instead of the YBA UI then yb-admin client is responsible for querying ListNamespaces which returns the id of this database and thus the filter that it sends in the request to the master also has the db id together with its name. Thus the restore is able to resolve the correct database.
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: