Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnlineDDL: an --in-order-completion migration should fail if prior migrations have failed (limit to same context) #16070

Closed
shlomi-noach opened this issue Jun 5, 2024 · 0 comments · Fixed by #16071
Assignees
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug

Comments

@shlomi-noach
Copy link
Contributor

The DDL startegy flag --in-order-completion is designed to make migrations compelte in the same order they were submitted. Per the docs

a migration that runs with this DDL strategy flag may only complete if no prior migrations are still pending (pending means either queued, ready or running states). --in-order-completion considers the order by which migrations were submitted. Note that --in-order-completion still allows concurrency. In fact, it is designed to work with concurrent migrations. The idea is that while many migrations may run concurrently, they must complete in-order.

This lets the user submit multiple migrations which may have some dependencies (for example, introduce two views, one of which reads from the other). As long as the migrations are submitted in a valid order, the user can then expect vitess to complete the migrations successfully (and in that order).

The point of in-order is to be able to handle dependencies. As the simplest example, you cannot create a view v that reads from table t before you've created table t. It should therefore stand that if creation of t failed, it's pointless to attempt to create v.

In the current vitess behavior, a migration waits for chronologically prior migrations before running or completing, but it only considers pending migrations, ie those which are in queued, ready, or running state. We should now also consider that if any prior migration is failed or cancelled, then there's no point in running our next migration.

Of course, there could be dozens of failed migrations in months of history. We therefore should limit the search to migrations in the same migration context.

This logic should apply in two scenarios:

  1. Do not even start an in-order migration where a previous migration has failed (or was cancelled)
  2. When concurrent migrations are enabled and running, fail a running migration if a prior migration has failed (or was cancelled).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug
Projects
None yet
1 participant