Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnlineDDL: "cancel-all" command to cancel all pending migrations in keyspace #7099

Merged
merged 7 commits into from
Dec 7, 2020

Conversation

shlomi-noach
Copy link
Contributor

This PR adds a vtctl OnlineDDL <keyspace> cancel-all command.

The command is pretty strong: it will cancel any pending migration for the given keyspace. A migration is pending if it's running or about to run, ie in one of these states:

  • queued
  • ready
  • running

I have a specific use case for cancel-all, where there's need to re-evaluate the migrations for a given keyspace; I can think of another use case: some BigOperation needs to take place on the keyspace and the operations team wishes to abort pending migrations.

General context: #6926

The cancel-all command cancels all pending (queued, ready, running) migrations in a keyspace

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach requested a review from a team December 2, 2020 19:56
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…-all

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Documentation PR: vitessio/website#622

@harshit-gangal
Copy link
Member

I think we might also want to cancel any ongoing migration as well. If it is realised that it cannot run at the moment may be because of wrong query or time of change is wrong or it can be grouped better with newer change, etc.

I suggest to add a parameter to this command which can be force cancel.

@shlomi-noach
Copy link
Contributor Author

I think we might also want to cancel any ongoing migration as well.

This already exists: https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/#cancelling-a-migration

e.g. vtctlclient OnlineDDL commerce cancel 2201058f_f266_11ea_bab4_0242c0a8b007

will cancel a speciic running migration, or, if it's not running yet, will ensure it doesn't run.

cancel-all, suggested in this PR, is based on the above.

@harshit-gangal
Copy link
Member

May be I miss read the description.

@shlomi-noach
Copy link
Contributor Author

So, we already have a per-migration cancel; what this PR adds is "cancel any migration that is pending on this keyspace", which is a pretty massive and risky operation, but fits really well with an internal POC I'm running, and can be useful fo remergency situations in production.

Comment on lines +190 to +191
time.Sleep(10 * time.Second)
checkCancelAllMigrations(t, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this sleep time be reduced. It just delays the e2e test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately all the online-DDL flow works asynchronously, with check intervals. In this endtoend tests intervals are reduced to just a few secs, and still you have to give time for the check, for gh-ost, for callback...

This is why I extracted online-ddl to be on its own CI shard; it's still shorter than our other endtoend tests so right now I feel like it's not worth investing the optimization effort.

Comment on lines +900 to +905
func (e *Executor) cancelPendingMigrations(ctx context.Context) (result *sqltypes.Result, err error) {
parsed := sqlparser.BuildParsedQuery(sqlSelectPendingMigrations, "_vt")
r, err := e.execQuery(ctx, parsed.Query)
if err != nil {
return result, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can be race condition right between running and completed Migration. For my understanding, what will happen when a running migration just got completed after it is selected in pending migrations Or when cancelPendingMigration is in progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, and this race condition is unavoidable. If the user chooses to "cancel" or "cancel-all" just when the migration is about to complete, it's possible that the migration will win the race and complete; in which case the "cancel" is a no-op and nothing happens (the migration is just complete)

@harshit-gangal harshit-gangal merged commit d165ec4 into vitessio:master Dec 7, 2020
@harshit-gangal harshit-gangal deleted the online-ddl-cancel-all branch December 7, 2020 05:12
@askdba askdba added this to the v9.0 milestone Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants