Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: normalize table options case #10200

Merged
merged 9 commits into from
May 4, 2022

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented May 3, 2022

Description

Follow up to #10188, this PR normalizes values of table options, upon reading table definition.

for example, the engine name for InnoDB is always InnoDB at that specific case. Even though INNODB is valid, we will always transform it to InnoDB.

Character sets and collations are always lower cased. ROW_FORMAT values are always upper cased.

We also now explicitly mutate utf8 to utf8mb3.

Related Issue(s)

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving release notes none labels May 3, 2022
@shlomi-noach shlomi-noach requested a review from dbussink May 3, 2022 09:24
@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has the correct release notes label. release notes none should only be used for PRs that are so trivial that they need not be included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@dbussink dbussink force-pushed the schemadiff-normalize-table-options branch from c97ceb5 to 9cfb746 Compare May 3, 2022 10:03
Comment on lines 192 to 220
var integralTypes = map[string]bool{
"tinyint": true,
"smallint": true,
"mediumint": true,
"int": true,
"bigint": true,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move these to mysql.go. There's other stuff I will move there.

@shlomi-noach
Copy link
Contributor Author

Tracking issue: #10203

@shlomi-noach
Copy link
Contributor Author

We have a legitimate endtoend test failure: https://github.com/vitessio/vitess/runs/6272304441?check_suite_focus=true

2022-05-03T11:53:18.3537021Z === RUN   TestSchemaChange/datetime-to-timestamp/validate_diff
2022-05-03T11:53:18.3537463Z     schemadiff_vrepl_suite_test.go:366: 
2022-05-03T11:53:18.3537981Z         	Error Trace:	schemadiff_vrepl_suite_test.go:366
2022-05-03T11:53:18.3538717Z         	            				schemadiff_vrepl_suite_test.go:256
2022-05-03T11:53:18.3539103Z         	Error:      	Not equal: 
2022-05-03T11:53:18.3542510Z         	            	expected: "CREATE TABLE `onlineddl_test` (\n  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,\n  `i` int(11) NOT NULL,\n  `ts0` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  `ts1` timestamp NULL DEFAULT NULL,\n  `dt2` datetime DEFAULT NULL,\n  `t` timestamp NULL DEFAULT NULL,\n  `updated` tinyint(3) unsigned DEFAULT '0',\n  PRIMARY KEY (`id`),\n  KEY `i_idx` (`i`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8"
2022-05-03T11:53:18.3546522Z         	            	actual  : "CREATE TABLE `onlineddl_test` (\n  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,\n  `i` int(11) NOT NULL,\n  `ts0` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  `ts1` timestamp NULL DEFAULT NULL,\n  `dt2` datetime DEFAULT NULL,\n  `t` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',\n  `updated` tinyint(3) unsigned DEFAULT '0',\n  PRIMARY KEY (`id`),\n  KEY `i_idx` (`i`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8"
2022-05-03T11:53:18.3547431Z         	            	
2022-05-03T11:53:18.3547788Z         	            	Diff:
2022-05-03T11:53:18.3548274Z         	            	--- Expected
2022-05-03T11:53:18.3548654Z         	            	+++ Actual
2022-05-03T11:53:18.3549138Z         	            	@@ -6,3 +6,3 @@
2022-05-03T11:53:18.3549705Z         	            	   `dt2` datetime DEFAULT NULL,
2022-05-03T11:53:18.3550358Z         	            	-  `t` timestamp NULL DEFAULT NULL,
2022-05-03T11:53:18.3551225Z         	            	+  `t` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
2022-05-03T11:53:18.3552051Z         	            	   `updated` tinyint(3) unsigned DEFAULT '0',
2022-05-03T11:53:18.3552734Z         	Test:       	TestSchemaChange/datetime-to-timestamp/validate_diff
2022-05-03T11:53:18.3553517Z         	Messages:   	mismatched table structure. ALTER query was: ALTER TABLE `onlineddl_test` MODIFY COLUMN `t` timestamp
2022-05-03T11:53:18.3554015Z     schemadiff_vrepl_suite_test.go:376: 
2022-05-03T11:53:18.3554523Z         	Error Trace:	schemadiff_vrepl_suite_test.go:376
2022-05-03T11:53:18.3555260Z         	            				schemadiff_vrepl_suite_test.go:256
2022-05-03T11:53:18.3556472Z         	Error:      	Expected nil, but got: &schemadiff.AlterTableEntityDiff{from:(*schemadiff.CreateTableEntity)(0xc0003fef00), to:(*schemadiff.CreateTableEntity)(0xc0003ff270), alterTable:(*sqlparser.AlterTable)(0xc000476c60)}
2022-05-03T11:53:18.3557412Z         	Test:       	TestSchemaChange/datetime-to-timestamp/validate_diff

Copy link
Contributor

@dbussink dbussink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here now better normalize / simplify DDL which means few cases of different behavior based on underlying MySQL (patch) versions and fewer potential conflicts when diffing.

"set": true,
}

// This is the default charset for MySQL 8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that right now schemadiff is opinionated in the sense that it targets specifically how MySQL 8.0 works.

// a timestamp defaults to `NOT NULL`.
//
// We opt here to instead remove that difference and always then add `NULL` and treat
// `explicit_defaults_for_timestamp` as always enabled in the context of DDL for diffing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case where the MySQL 8.0 behavior is considered the standard and the behavior we assume.

Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I would use the authoritative list of collations from our collations package because it's going to be kept up to date more consistently. :)

go/vt/schemadiff/mysql.go Outdated Show resolved Hide resolved
shlomi-noach and others added 9 commits May 4, 2022 09:53
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
This adds a number of additional normalization rules. First it enforces
lowercasing for a number of column level options, namely the type,
charset & collate. It then also maps utf8 always to utf8mb3.

Additionally, we drop any length definitions for integral types.

Another import one is that we drop any charset / collate definitions
that match what the default table level option already is. This ensures
a smaller diff and no conflicts on differences there. Different MySQL
versions behavior differently there and this creates a consistent
situation.

Lastly it also fixes ENGINE references in PARTITION clauses to be
handled consistently.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
When a column is defined with an explicit charset, but no collation, it
defaults to the default collation for that charset, not the collation
defined at the table level.

This means that we need to handle this case in normalization. This is
solved by solving the kind of "reverse" situation. What we do is still
remove the charset definition, but if it is specified and diverges from
the default collation at the table level, we add the collation
explicitly at the column level.

This is more clear for the end user what is happening and they are using
a different collation on the column (even though the CREATE syntax kinda
implicitly had that side effect in MySQL).

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Timestamps have a fun runtime behavior that's dependent on the
explicit_defaults_for_timestamp setting in MySQL.

By default that's off in 5.7 and turned on in 8.0. With that flag
disabled, MySQL will create timestamp fields with `NOT NULL` by default.

In our normalization, we always opt for the behavior that 8.0 has from a
DDL perspective when we parse things, assuming a `NULL` that is missing
means `NULL` as documented. To ensure things work consistently then with
5.7 as well, in the normalized form we do add `NULL` to the DDL always
to ensure we create things as expected.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Depend on the collation package for all needed collation information.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
@dbussink dbussink force-pushed the schemadiff-normalize-table-options branch from f5eba03 to 9e6ccc3 Compare May 4, 2022 07:55
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vmg vmg merged commit 46cb467 into vitessio:main May 4, 2022
@vmg vmg deleted the schemadiff-normalize-table-options branch May 4, 2022 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants