Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: tracking issue #10203

Open
shlomi-noach opened this issue May 3, 2022 · 0 comments
Open

schemadiff: tracking issue #10203

shlomi-noach opened this issue May 3, 2022 · 0 comments
Assignees
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: RFC Request For Comment

Comments

@shlomi-noach
Copy link
Contributor

shlomi-noach commented May 3, 2022

Fashionably late, setting up a tracking issue for schemadiff, introduced in #9719.

General overview

schemadiff is a vitess library that manages schema definitions, and is able to present a logical diff between two tables, two views, or two schemas. It does so declaratively, without accessing a MySQL server, and is based on sqlparser, a core library in vitess.

Entities

schemadiff recognizes these entities:

  • A table, mapped to a database TABLE
  • A view, mapped to a database VIEW
  • A schema, a collection of tables and views

Tables and views are also entities known to sqlparser's AST. The parser can parse CREATE TABLE or CREATE VIEW statements into well formed constructs. schemadiff's main logic is to utilize these well formed constructs, comparing them, manipulating them or validating them.

Diffs

At the heart of schemadiff is to be able to diff:

  • Two tables, resulting in a potentially empty ALTER TABLE diff
  • Two views, resulting in a potentially empty ALTER VIEW diff
  • Two schemas, resulting in a potentially empty list of DROP/ALTER/CREATE TABLE/VIEW statements

A resulting diff is a well formed construct in itself. e.g. the diff of two tables, if not empty, is a ALTER TABLE, represented by sqlparser's matching AST construct. The diff also attains context to its "from" and "to" entities.

Taking a tables diff as the classic example, the diff is an ALTER TABLE query that takes a table from one format to another (ignoring table name). The alteration may include:

  • Adding, removal, modification of columns
  • Adding, removal of keys
  • Adding, removal of foreign keys
  • Adding, removal, modification of table options
  • Changing of partitioning scheme

Input and output

schemadiff reads queries, or statements. Because it does not access a MySQL server, by design, it is fed with CREATE statements that depict a table, view or schema.
SQL can have many formatting options, and a table can be presented in multiple ways (see normalization below). It is advisable to feed schemadiff with formal output of a SHOW CREATE TABLE or SHOW CREATE VIEW statements.

Entity constructs can export their well formed constructs, or they can export their CREATE SQL statements. For example, you may export a schema into a list of SQL queries via ToQueries() or into one big blob of definitions with ToSQL().

Diffs can export their formal constructs, or they can export their SQL statements.

Validation

We require schemadiff some validations over the structure of the schema/tables/views. It is unfeasible to run all validations in a declarative aproach. Here are some examples to validations schemadiff can do/should do, and some that it can't/won't:

Attainable validations:

  • A schema cannot have two tables of same name, or views, or combinations
  • A table cannot have two columns of same name
  • A table's index must only reference existing columns
  • A partition schema must only refer to existing columns
  • A view must only reference existing tables/views
  • Circular view dependency is not allowed
  • etc.

Validations we will probbaly not attain:

  • Maximum indexed character length
  • Maximum character row length
  • etc.

Normalization

schemadiff needs to be able to normalize a schema. SQL allows equivalent or implicit definitions that present different statements. For example:

  • Statements are case insensitive; we want to normalize an alter table to ALTER TABLE
  • Identifies can usually go unqualified. We want to always qualify them.
  • Character set names are case insensitive. We want to choose a format.
  • A nullable column defaults to NULL whether or not it specifies DEFAULT NULL, assuming no other DEFAULT value defined
  • An int is signed whether signed is specified or not, as long as unsigned isn't specified
  • Types like int and attributes like unsigned are lower case in MySQL canonical output and we match that format.
  • A schema can have tables defined in any order. We want alphabetical ordering where possible
  • Ensure view definitions are ordered according to dependency tree
  • Its possible to clear a column's charset/collation definition if they are identical to the table's
  • etc.

Applying diffs

schemadiff also supports the notion of applying schema changes declaratively, again without a MySQL server. It only supports applying Diffs generated by schemadiff itself.

For example, given a "from" and a "to" tables, schemadiff core functionality is to be able to generate a AlterTableEntityDiff. It is then possible to Apply() that diff onto the CreateTableEntity "from" table, and get a CreateTableEntity equivalent to the "to" table.

By applying changes logically, and using validations as mentioned above, schemadiff is able to catch table definition errors without having to contact a server. For example, it can verify that an ALTER TABLE ... ADD KEY ... only uses existing columns; or, given a ALTER TABLE ... DROP COLUMN ... it can edit or remove keys referencing that column.


schemadiff is work in progress. Linked below this comment are relevant pull requests and issues.

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving Type: RFC Request For Comment labels May 3, 2022
@shlomi-noach shlomi-noach self-assigned this May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: RFC Request For Comment
Projects
None yet
Development

No branches or pull requests

1 participant