Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Online DDL cut-over backoff + forced completion #14530

Closed
shlomi-noach opened this issue Nov 16, 2023 · 0 comments · Fixed by #14546
Closed

Feature Request: Online DDL cut-over backoff + forced completion #14530

shlomi-noach opened this issue Nov 16, 2023 · 0 comments · Fixed by #14546
Assignees
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Feature Request

Comments

@shlomi-noach
Copy link
Contributor

Feature Description

An Online DDL ALTER TABLE completes by cutting over from the original table to the shadow table. This final step involves holding table locks, and has a timeout.

On very busy tables, the operation will timeout. The Online DDL scheduler will reattempt after 1 minute. Under a sustained load this could mean repetitive attempts over hours at 1 minute intervals. This is both wasteful and harmful. It's harmful because 15sec in every minute will attempt to acquire locks, which means interfering with traffic even more.

We want to offer two opposed changes at the same time:

  1. A backoff mechanism: first retry in 1min, then in, say, 5min, then 10min, 30min, 1hr, and keep at 1h intervals (precise values to change).
  2. A way to require a brute-force cut-over. This involves:
  • A pre-determined brute force cutover duration: counting from the moment of the first cut-over attempt, after given duration the Online DDL attempts a brute-force cut-over (see following)
  • And/or a SQL command such as ALTER VITESS_MIGRATION ... DO THE THING AND BRUTE FORCE CUT OVER NOW PLEASE
  • Brute-force cut-over implemented by identifying any queries + transactions holding locks on migrated table. When in brute-force mode, the cut-over mechanism attempts to kill related queries/connections.

Industry solutions typically attempt to kill any non-replication long-running queries. We want to be smart and only affect relevant queries, as well as identify transactions that are holding locks on the table but not in fact running any specific query on the table at the moment, maybe not running any query at the moment.

Use Case(s)

Online DDL on busy systems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Feature Request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant