-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: safe, lazy, managed DROP TABLE #6689
Comments
The actualy implementation will be in
This means a user-initiated There just might be a limbo state, where the user e.g. runs |
to standardize statements, perhaps: DROP IN '24h' TABLE my_table;
DROP IN NOWAIT TABLE my_table;
DROP IN DFAULT TABLE my_table; |
For some cases performing incremental row deletes might be not needed and could be skipped. |
We need to make a decision whether to allow for alternate methods/flows for dropping tables:
Taking, for example, the request above (skip purging process): do we support that, if so, do we set this up with a command line flag? Or as part of the |
I think I've found a nice flow that is also flexible and adjustable to fit different users' needs. The general table lifecycle flow is this, and in this order:
Vitess will transition the table through these states, in order. But it will also support skipping some states. Let's first explain the meaning of the states: Real tablewell this is obvious. Table is used by the app. HOLDTable is renamed to something like PURGETable renamed to e.g EVACTable renamed to e.g. DROPTable renamed to e.g. goneend of lifecycle Transitioning and skipping of statesThe above lifecycle will be the default, and safest cycle. It is also the longest. Some users must use this step (in my history, I've seen cases where dropping a single-row tables would lock production for minutes). Others have no issue dropping a table with millions of rows, and don't want to pay the IO+time of purging the data. We can introduce a
I think this method is flexible enough. It's also simple to code, given we run through a state machine, anyway. |
#6719 implements the above #6689 (comment). It extends the throttling PR and so it's best to first merge #6668 before reviewing #6719 . |
#7221 finalizes the |
DROP TABLE
is a risky MySQL operation in production. There seem to be multiple components involved, the major being that if the table has pages in InnoDB's buffer pool (one or many), then the buffer pool is locked for the duration of theDROP
. The duration of theDROP
is also related with the time it takes the operating system to delete the.ibd
file associated with the table (assuminginnodb_file_per_table
).Noteworthy that the problem is in particular on the
primary
(master
) node; Replicas are not affected as much.Different companies solve
DROP TABLE
in different ways. An interesting discussion is found ongh-ost
's repo and on mysql bugs:Solutions differ in implementation, but all suggest waiting for some time before actually dropping the table. That alone requires management around
DROP TABLE
operations. As explained below, waiting enables reverting the operation.Vitess should automate table drops and make the problem transparent to the user as much as possible. Breakdown of the suggested solution follows.
Illustrating the DROP steps
We can make the
DROP
management stateful or stateless. I opt for stateless: no meta tables to describe the progress of theDROP
. The state should be inferred from the tables themselves. Specifically, we will encode hints in the table names.We wish to manage
DROP
requests. Most managedDROP
requests will wait before destroying data. If the user issued aDROP TABLE
only to realize the app still expects the table to exist, then we make it possible to revert the operation.This is done by first issuing a
RENAME TABLE my_table TO something_else
. To the app, it seems like the table is gone; but the user may easily restore it by running the revert query:RENAME TABLE something_else TO my_table
.That
something_else
name can be e.g._vt_HOLD_2201058f_f266_11ea_bab4_0242c0a8b007_20200910113042
.At some point we decide that we can destroy the data. The "hold" period can either be determined by vitess or explicitly by the user. e.g. On a successful schema migration completion, Vitess can choose to purge the "old" table right away.
At that stage we rename the table to e.g.
_vt_PURGE_63b5db0c_f25c_11ea_bab4_0242c0a8b007_20200911070228
.A table by that name is eligible to have its data purged.
In my experience (see
gh-ost
issue above), a safe method to purge data is to slowly remove rows, until the table is empty. Note:SET SQL_LOG_BIN=0
and only purge the table on theprimary
. This reduces binlog size and also does not introduce replication lag. One may argue that lag-based throttling is not needed, but in my experience it's still wise to use, since replication lag can imply load onprimary
, and it's best to not overload theprimary
.DELETE FROM my_table LIMIT 50
; 10-100 are normally good chunk sizes. Order of purging does not matter.It's important to note that the
DELETE
statement actually causes table pages to load into the buffer pool, which works against us.Once all rows are purged from a table, we rename it again to e.g.
_vt_DROP_8a797518_f25c_11ea_bab4_0242c0a8b007_20200911234156
. At this time we point out that20200911234156
is actually a readable timestamp, and stands for2020-09-11 23:41:56
. That timestamp can tell us when the table was last renamed.Vitess can then run an actual
DROP TABLE
for_vt_DROP_...
tables whose timestamp is older than, say, 2 days. As mentioned above, purging the table actually caused the table to load onto the buffer pool, and we need to wait for it to naturally get evicted, before dropping it.Asking for a safe DROP
We can introduce a new
DROP TABLE
syntax. The user will issueDROP TABLE
with hints to Vitess. Examples could be:DROP IN '24h' TABLE my_table
:my_table
renamed to_vt_HOLD_...
where it will spend at least 24h before transitioning to_vt_PURGE...
. The user will have 24h to regret and revert theDROP
.DROP IN '30min' TABLE my_table
: same, 30min onlyDROP NOWAIT TABLE my_table
: the user indicates they're sure they won't regret theDROP
. We save time and immediately rename to_vt_PURGE_...
DROP DEFAULT TABLE my_table
: Vitess chooses theHOLD
time (e.g. 3 days)Vitess can internally choose to drop tables, my immediate example is with automated schema migrations. Whether successful or failed, it's generally safe to purge the artifact tables immediately.
The text was updated successfully, but these errors were encountered: