Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: tracking issue #6926

Open
shlomi-noach opened this issue Oct 22, 2020 · 14 comments
Open

Online DDL: tracking issue #6926

shlomi-noach opened this issue Oct 22, 2020 · 14 comments

Comments

@shlomi-noach
Copy link
Contributor

This issue will be the tracking space for all things vitess Online DDL. Note that this issue is created after some substantial work is done:

#6547 served as a long running tracking point; pasting some of #6547 content here for background, purpose and intentions.

TL;DR

Automate away all the complexity of schema migrations. Users issue:

alter with 'gh-ost' table example modify id bigint not null;

alter with 'pt-osc' table example modify id bigint not null

or

$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global \
    ApplySchema -sql "alter with 'gh-ost' table example modify id bigint unsigned not null" commerce

$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global \
    ApplySchema -sql "alter with 'pt-osc' table example modify id bigint unsigned not null" commerce

(syntax subject to change, see #6782 )

and vitess will schedule an online schema change operation to run on all relevant shards, then proceed to apply the change via gh-ost on all shards.

The ALTER TABLE problem

First, to iterate the problem: schema changes have always been a problem with MySQL; a straight ALTER is a blocking operation; a ONLINE ALTER is only "online" on the master/primary, but is effectively blocking on replicas. Online schema change tools like pt-online-schema-change and gh-ost overcome these limitations by emulating an ALTER on a "ghost" table, which is populated from the original table, then swapped in its space.

Traditionally, online schema changes are considered to be "risky". Trigger based migrations add significant load onto the master server, and their cut-over phase is known to be a dangerous point. gh-ost was created at GitHub to address these concerns, and successfully eliminated concerns for operational risks: with gh-ost the load on the master is low, and well controlled, and the cut-over phase is known to cause no locking issues. gh-ost comes with different risks: it applies data changes programmatically, thus the issue of data integrity is of utmost importance. Another note of concern is data traffic: going out from MySQL into gh-ost and back into MySQL (as opposed to all-in MySQL in pt-online-schema-change).

This way or the other, running an online schema change is typically a manual operation. A human being will schedule the migration, kick it running, monitor it, possibly cut-over. In a sharded environment, a developer's request to ALTER TABLE explodes to n different migrations, each needs to be scheduled, kicked, monitored & tracked.

Sharded environments are obviously common for vitess users and so these users feel the pain more than others.

Schema migration cycle & steps

Schema management is a process that begins with the user designing a schema change, and ends with the schema being applied in production. This is a breakdown of schema management steps as I know them:

  1. Design code
  2. Publish changes (pull request)
  3. Review
  4. Formalize migration command (the specific ALTER TABLE or pt-online-schema-change or gh-ost command)
  5. Locate: where in production should this migration run?
  6. Schedule
  7. Execute
  8. Audit/monitor
  9. Cut-over/complete
  10. Cleanup
  11. Notify user
  12. Deploy & merge

What we propose to address

Vitess's architecture uniquely positions it to be able to automate away much of the process. Specifically:

  • Formalize migration command: turning an ALTER TABLE statement into a gh-ost/pt-osc invocation is super useful if done by vitess, since vitess can not only validate schema/params, but also can provide credentials, apply throttling logic, can instruct gh-ost on how to communicate progress via hooks, etc.
  • Locate: given schema/table, vitess just knows where the table is located. It knows if the schema is sharded. It knows who the shards are, who the shards masters are. It knows where to run gh-ost. Last, vitess can tell us which replicas we can use for throttling.
  • Schedule: vitess is again in a unique position to schedule migrations. The fact someone asks for a migration to run does not mean the migration should start right away. For example, a shard may already be running an earlier migration. Running two migrations at a time is less than ideal, and it's best to wait out the first migration before beginning the second. A scheduling mechanism is both useful to running the migrations in optimal order/sequence, as well as providing feedback to the user ("your migration is on hold because this and that", or "your migration is 2nd in queue to run")
  • Execute: vttablet is the ideal entity to run a migration; can read instructions from topo server and can write progress to topo server. vitess is aware of possible master failovers and can request a re-execute is a migration is so interrupted mid process.
  • Audit/monitor: vtctld API can offer endpoints to track status of a migration (e.g. "in progress on -80, in queue on 80-"). It may offer progress pct and ETA.
  • cut-over/complete: in my experience with gh-ost, the cut-over phase is safe to automate away. If running a migration during a resharding operation, then we may need to coordinate cut-over between upstream and downstream migrations.
  • cleanup: the old table needs to be dropped; vttablet runs a table lifecycle service (aka garbage collector) to clean up those tables.

@shlomi-noach
Copy link
Contributor Author

Logic for auto-retrying a migration: #6901

  • If migration is identified to have started in a different tablet, and
  • has failed - found to be stale
  • and hasn't been retried yet (temporary restriction)
  • then, it is automatically retried. The main use case is a failover scenario: if primary fails, tablet and gh-ost alike, then the newly promoted tablet will kick the migration back to life (starting a new migration).

Docs in vitessio/website#571

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 2, 2020

#7083 supports CREATE and DROP statements in ApplySchema to run as online DDL.

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 9, 2020

Once #7083 and #7097 are merged:

when ddl_strategy is an online strategy, analyze a DROP TABLE statement to:

  • Explode into single-table statements (a single DROP TABLE statement can indicate multiple tables)
  • Turn each one into a RENAME TABLE statement, renaming into a HOLD gc state with 48 hour ETA
  • vttablet should transition away from HOLD even if ETA is unmet, if table_gc_lifecycle does not include HOLD

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 9, 2020

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 9, 2020

  • support CREATE INDEX statements as online DDL, and convert them to ALTER TABLE syntax -- required by gh-ost.

@shlomi-noach
Copy link
Contributor Author

Online DDL via VReplication

At this time I have clarity as for how this will look like.

On one hand, we will go full native and reuse Vitess existing mechanisms. On the other hand, we break apart from the existing flow in multiple ways.

POC is in #7419 . It is just the beginning of what online DDL via vreplication will look like -- but already has the initial implementation running.

Some design bulletpoints:

  • We will use vreplication to replicate from a primary tablet onto itself (same keyspace, same shard, same tablet, both as source and target)
  • Our existing vreplication flows are owned by the user. User e.g. runs a MoveTables, user then runs SwitchReads, then runs SwitchWrites -- based on how the user perceives the process is going on.
    However, we want to have schema migrations fully automated. Not only do we want to own the starting of the migration (remember our tablet schedule migrations sequentially), but we also want to reliably automatically cut-over the migration.
  • To that effect, tabletserver will be the owner of schema migrations/vreplication. As with gh-ost and pt-osc, a shard's primary table has the independece to schedule the next migration, run that migration, potentially cancel or retry it, and follow it to completion. This means no vtctl, no wrangler
  • Another important difference from normal VReplication flows is that in all other flows, there's source table(s) and target table(s), and they are distinct. In MoveTables we can go as far as set routing rules for new queries to route to the target tables. In a schema migration, we want to replace the original table (via RENAME TABLE). We will do this underneath the feet of VReplication.
  • As opposed to gh-ost or pt-osc, vreplication migrations can natively survive a failover (in fact, this will be one of the more important advantages of migrations/vreplication); this calls for some redesign in onlineddl.Executor
  • we import some initial-setup analysis and preparation code from gh-ost. Some of it is redundant, and will be cleaned up. Some of it overlaps with Vitess functionality (parsing) and will be replaces. But for now we know it's stable and working.

Starting a vreplication schema migration

The flow for starting a vreplication based schema migration from tabletserver is:

  • ensure there's no vreplication on same workflow
  • create an empty table LIKE the original table
  • apply the ALTER TABLE statement onto the empty table
  • analyze both tables, validate some basic constraints ar met (this is largely imported from gh-ost)
  • evaluate the vreplication source filter query that only selects relevant columns, and takes care of column renames and of generated columns.
  • reload schema (because a prev migration might have introduced e.g. a new column)
  • create a new vreplicatoin entry
  • start vreplication

tracking a running migration

  • onlineddl.Executor to keep track of vreplication stream liveness by looking at _vt.vreplication entry.
  • identify stale migrations and remove them
  • determine that a migration is ready: copy phase is complete, pos is non empty, time updated and transaction time are both up-to-date (small or no lag)

cutting over a vreplication migration

The flow is:

  • We don't use wrangler. Create our own tablet manager client.
  • Get hold of Tablet
  • Get hold of ShardInfo
  • Get hold of topology server (already exists in onlineddl.Executor)
  • read vreplication information
  • lock keyspace
  • stop writes on source
  • read up-to-date stream (specifically, pos)
  • wait for pos
  • stop vreplication. it is now up to date with original table
  • swap source and target tables from beneath vreplication
  • all good
  • (in the future, this flow will actually continue; more writeup in the future)

More to come.

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Feb 16, 2021

It should be possible to ReloadTable, like ReloadSchema but for a single table. This is desired for online DDL. Reloading an entire schema takes time and we cannot expect to accomplish it within the timeframe of a cut-over. But reloading a single table should be just fine.

  • support ReloadTable

@shlomi-noach
Copy link
Contributor Author

Revert for Online DDL is now available (per PR review) via #7478

@ajm188 ajm188 removed the P2 label Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants