Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: safe, lazy, managed DROP TABLE #6689

Open
shlomi-noach opened this issue Sep 10, 2020 · 7 comments
Open

Tracking issue: safe, lazy, managed DROP TABLE #6689

shlomi-noach opened this issue Sep 10, 2020 · 7 comments

Comments

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Sep 10, 2020

DROP TABLE is a risky MySQL operation in production. There seem to be multiple components involved, the major being that if the table has pages in InnoDB's buffer pool (one or many), then the buffer pool is locked for the duration of the DROP. The duration of the DROP is also related with the time it takes the operating system to delete the .ibd file associated with the table (assuming innodb_file_per_table).

Noteworthy that the problem is in particular on the primary (master) node; Replicas are not affected as much.

Different companies solve DROP TABLE in different ways. An interesting discussion is found on gh-ost's repo and on mysql bugs:

Solutions differ in implementation, but all suggest waiting for some time before actually dropping the table. That alone requires management around DROP TABLE operations. As explained below, waiting enables reverting the operation.

Vitess should automate table drops and make the problem transparent to the user as much as possible. Breakdown of the suggested solution follows.

Illustrating the DROP steps

We can make the DROP management stateful or stateless. I opt for stateless: no meta tables to describe the progress of the DROP. The state should be inferred from the tables themselves. Specifically, we will encode hints in the table names.

We wish to manage DROP requests. Most managed DROP requests will wait before destroying data. If the user issued a DROP TABLE only to realize the app still expects the table to exist, then we make it possible to revert the operation.

This is done by first issuing a RENAME TABLE my_table TO something_else. To the app, it seems like the table is gone; but the user may easily restore it by running the revert query: RENAME TABLE something_else TO my_table.

That something_else name can be e.g. _vt_HOLD_2201058f_f266_11ea_bab4_0242c0a8b007_20200910113042.

At some point we decide that we can destroy the data. The "hold" period can either be determined by vitess or explicitly by the user. e.g. On a successful schema migration completion, Vitess can choose to purge the "old" table right away.
At that stage we rename the table to e.g. _vt_PURGE_63b5db0c_f25c_11ea_bab4_0242c0a8b007_20200911070228.
A table by that name is eligible to have its data purged.

In my experience (see gh-ost issue above), a safe method to purge data is to slowly remove rows, until the table is empty. Note:

It's important to note that the DELETE statement actually causes table pages to load into the buffer pool, which works against us.

Once all rows are purged from a table, we rename it again to e.g. _vt_DROP_8a797518_f25c_11ea_bab4_0242c0a8b007_20200911234156. At this time we point out that 20200911234156 is actually a readable timestamp, and stands for 2020-09-11 23:41:56. That timestamp can tell us when the table was last renamed.

Vitess can then run an actual DROP TABLE for _vt_DROP_... tables whose timestamp is older than, say, 2 days. As mentioned above, purging the table actually caused the table to load onto the buffer pool, and we need to wait for it to naturally get evicted, before dropping it.

Asking for a safe DROP

We can introduce a new DROP TABLE syntax. The user will issue DROP TABLE with hints to Vitess. Examples could be:

  • DROP IN '24h' TABLE my_table: my_table renamed to _vt_HOLD_... where it will spend at least 24h before transitioning to _vt_PURGE.... The user will have 24h to regret and revert the DROP.
  • DROP IN '30min' TABLE my_table: same, 30min only
  • DROP NOWAIT TABLE my_table: the user indicates they're sure they won't regret the DROP. We save time and immediately rename to _vt_PURGE_...
  • DROP DEFAULT TABLE my_table: Vitess chooses the HOLD time (e.g. 3 days)

Vitess can internally choose to drop tables, my immediate example is with automated schema migrations. Whether successful or failed, it's generally safe to purge the artifact tables immediately.

@shlomi-noach
Copy link
Contributor Author

The actualy implementation will be in vttablet. A primary vttablet will routinely check for tables that need work:

  • A once per hour check is enough; we are going to spend days in dropping a tables, so no need to check frequently
  • Look for tables called _vt_(HOLD|PURGE|DROP)_<UUID>_<TIMESTAMP> (be very strict about the pattern search, because why not)
  • If time is right, RENAME to the next step (HOLD renames to PURGE, PURGE renames to DROP)
  • Continuously purge rows from tables in PURGE state. We may want to purge only one table at a time, e.g. purge oldest table, until it is empty, immediately rename it to DROP state, move on to the next table, begin purging it.
    • Use throttler in between purge chunks

This means a user-initiated DROP ... TABLE on a sharded cluster, is handled by each shard independently. The data is purged at different times, and the disk space reclaimed at different time, each shard will do its best, but will likely have somewhat different workloads to make the operations run faster on some, and slower on others.

There just might be a limbo state, where the user e.g. runs DROP IN '24h' TABLE my_table and after 24h20m chooses to "undrop" it; some shards might already start purging data, where others haven't. We may need a 2PC-like mechanism to ensure the "undrop" is atomic across shards: either all are able to restore original table, or we prevent those who can from doing so.

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Sep 14, 2020

to standardize statements, perhaps:

DROP IN '24h' TABLE my_table;
DROP IN NOWAIT TABLE my_table;
DROP IN DFAULT TABLE my_table;

@DeathBorn
Copy link
Contributor

For some cases performing incremental row deletes might be not needed and could be skipped.

@shlomi-noach
Copy link
Contributor Author

We need to make a decision whether to allow for alternate methods/flows for dropping tables:

  • some users just wait and drop
  • some need to purge rows
  • some operate independently on replicas.

Taking, for example, the request above (skip purging process): do we support that, if so, do we set this up with a command line flag? Or as part of the DROP statement? Which method does Vitess pick for self-induced DROPs, namely dropping old tables after an online schema migration?

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Sep 16, 2020

I think I've found a nice flow that is also flexible and adjustable to fit different users' needs.

The general table lifecycle flow is this, and in this order:

  1. Real table
  2. HOLD
  3. PURGE
  4. EVAC
  5. DROP
  6. gone

Vitess will transition the table through these states, in order. But it will also support skipping some states. Let's first explain the meaning of the states:

Real table

well this is obvious. Table is used by the app.

HOLD

Table is renamed to something like _vt_HOLD_6ace8bcef73211ea87e9f875a4d24e90_20200915120410. Vitess will not make changes to the table, will not drop data. The table is put away for safe keeping for X hours/days. If it turns out the app still needs the table, the user can RENAME is back to its original name, taking it out of this game.

PURGE

Table renamed to e.g _vt_PURGE_6ace8bcef73211ea87e9f875a4d24e90_20200916080539. Vitess purges (or will purge, based on workload and prior engagements) rows from PURGE tables. Data is lost and the user may not resurrect the table anymore.
Most likely we will settle for a SQL_LOG_BIN=0, ie purging will not go through replication. The replicas are not so badly affected by DROP statements as a primary.

EVAC

Table renamed to e.g. _vt_EVAC_6ace8bcef73211ea87e9f875a4d24e90_20200918192031. The table sits still for Y houtrs/days. I'm thinking this period will be pre-defined by vitess. The purpose of this state is to wait a reasonable amount of time so that tabe's pages are evacuated from the innodb buffer pool by the natural succession of production IO/memory activity.

DROP

Table renamed to e.g. _vt_DROP_6ace8bcef73211ea87e9f875a4d24e90_20200921193202. Vitess will DROP TABLE this table imminently.

gone

end of lifecycle

Transitioning and skipping of states

The above lifecycle will be the default, and safest cycle. It is also the longest. Some users must use this step (in my history, I've seen cases where dropping a single-row tables would lock production for minutes). Others have no issue dropping a table with millions of rows, and don't want to pay the IO+time of purging the data.

We can introduce a vttablet command line flag: -table_gc_lifecycle="hold,purge,evac,drop", that's the default value. drop is implict, if you don't specify it, it's automatically appended. Otherwise, consider the following flows:

  • some people just happy to drop tables with no wait, no purge, no nothing. They will set -table_gc_lifecycle="drop"
  • some people want to keep the table around for a few days, then just go ahead and drop it: -table_gc_lifecycle="hold,drop"
  • some want to keep the table, then purge it, then immediately drop: -table_gc_lifecycle="hold,purge,drop"

I think this method is flexible enough. It's also simple to code, given we run through a state machine, anyway.

@shlomi-noach
Copy link
Contributor Author

#6719 implements the above #6689 (comment). It extends the throttling PR and so it's best to first merge #6668 before reviewing #6719 .

@shlomi-noach
Copy link
Contributor Author

#7221 finalizes the DROP TABLE flow by converting the DROP TABLE statement into RENAME TABLE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants