-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb][ysql] DROP DATABASE leaves master Table objects in RUNNING state on disk #2997
Labels
Comments
bmatican
added
area/ysql
Yugabyte SQL (YSQL)
area/docdb
YugabyteDB core features
labels
Nov 22, 2019
jaki
pushed a commit
to jaki/yugabyte-db
that referenced
this issue
Dec 13, 2019
Summary: Since a116cd2, we introduced a subtle bug where we were only marking tables as DELETING in memory, rather than also saving it in raft. Technically, there is a background job that would then finally flip them from DELETING to DELETED, once all tablets were also deleted, which would in fact write to raft. However, a fault / master leader failover could cause these user tables to never have their raft state updated, so on failover, they'd come back as RUNNING. Separately though, the system tables were never deleted anyway, which would leave TableInfo objects that are RUNNING, but belong to no currently live database. Test Plan: `mvn test -Dtest=org.yb.pgsql.TestPgCatalogPersistence#testDropDatabase` This fixes the test, but breaks on shutdown with: ``` org.postgresql.util.PSQLException: ERROR: Not found: GetTableLocations(template1.pg_shdepend, , 1) failed: No such tablet found ``` Reviewers: mihnea, nicolas, mikhail Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D7616 (cherry picked from commit 51d659966a6923a0513e36bc05294047dc9cd323)
jaki
pushed a commit
that referenced
this issue
Dec 16, 2019
Summary: Since commit a116cd2, we introduced a subtle bug where we were only marking tables as `DELETING` in memory rather than also saving to raft. Technically, there is a background job that finally flips them from `DELETING` to `DELETED` once all tablets are also deleted, and this, in fact, writes to raft. However, a fault or master leader failover can cause these user tables to never have their raft state updated, so, on failover, they'd come back as `RUNNING`. Separately, the system tables are never deleted anyway, leaving TableInfo objects that are `RUNNING` without a live database. Fix the first bug by switching the ordering of marking tables as `DELETING` in memory and saving to raft. Fix the second bug by 1. Removing system tables just like user tables without removing the system catalog tablet 1. Removing system tables corresponding to the database from the `table_ids` field of the system catalog tablet entry 1. Running background tasks when dropping an empty database (has no user tables) Create new tests: * `PgMiniTest.DropDBMarkDeleted` * `PgMiniTest.DropDBUpdateSysTablet` * `PgMiniTest.DropDBWithTables` Perform other minor changes: * Add `PgMiniTest::ConnectToDB` function * Add helper function `CatalogManager::RemoveTableIdsFromTabletInfo` * Add helper function `PersistentTableInfo::is_deleting` * Change the source file signature of `TableInfo::GetAllTablets` to use `TabletInfos` over the expanded `vector<...>`, matching the header file's signature * Fix a comment typo in `CatalogManager::AbortTableCreation` * Remove some useless comments in `PgMiniTest::RowLockConflictMatrix` (Credits to @bmatican for partially authoring this commit.) Test Plan: * `./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter` * `PgMiniTest.DropDBMarkDeleted` * `PgMiniTest.DropDBUpdateSysTablet` * `PgMiniTest.DropDBWithTables` * Manual `DROP DATABASE` test: 1. `./bin/ysqlsh -c "CREATE DATABASE d;"` 1. `./bin/ysqlsh -c "DROP DATABASE d;"` 1. Kill and respawn master leader process to elect new master leader 1. Make sure `./bin/cqlsh -e "SELECT * FROM system.partitions;"` runs without errors Reviewers: nicolas, mikhail, neha, bogdan, mihnea, hector Reviewed By: bogdan, hector Subscribers: hector, neha, jason, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D7616
This issue was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
a116cd2 introduced a bug where we're not actually saving the DELETED state for user tables into raft, but just modifying it in memory for that particular master leader. Thankfully, there is a background thread responsible for moving tables from DELETING to DELETED once all their tablets are marked as DELETED. However, there is a time gap until that happens.
As such, if we suffer a master failure / leadership change, in the middle of the DROP DATABASE operation, before the background thread can clean up, then these tables COULD still be around as RUNNING.
Separately, while investigating this issue, I also noticed two other things:
kSysCatalogTabletId
does not clean up system table IDs, from deleted databases, from it'stable_ids
PB field.cc @nspiegelberg @m-iancu
The text was updated successfully, but these errors were encountered: