Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ApplySchema fails on DDLs which don't result in a schema change (e.g. "drop table if exists" on a non-existing table) #1731

Closed
ashudeep-sharma-zz opened this issue May 26, 2016 · 9 comments

Comments

@ashudeep-sharma-zz
Copy link
Contributor

In general application owners may want to write the DDL scripts in the following manner:
drop table if exists shipment;
create table shipment (shipmentid bigint(20) not null, trackingid varchar(15) default null, createdatetime datetime default current_timestamp,isactive tinyint(1) not null default 0, primary key (shipmentid));
...
...

However if the table shipment doesn;t exist initially the command drop table if exists shipment; should gracefully execute and after that create the shipment table, but when we try to execute the above two commands when the table shipment doesn;t exists, it gives the following error:

E0526 09:22:49.821551 6785 main.go:43] Remote error: rpc error: code = 13 desc = invalid header field value "Schema change failed, ExecuteResult: {\n "FailedShards": null,\n "SuccessShards": null,\n "CurSQLIndex": 0,\n "Sqls": [\n "drop table if exists shipment",\n "create table shipment (shipmentid bigint(20) not null, trackingid varchar(15) default null, createdatetime datetime default current_timestamp,isactive tinyint(1) not null default 0, primary key (shipmentid))",\n "create table runsheet (runsheetid bigint(20) not null, createdatetime datetime default current_timestamp, isactive tinyint(1) not null default 1, deliverydate datetime default null, primary key (runsheetid))",\n "create table runsheet_shipments(runsheetid bigint(20), shipmentid bigint(20), primary key(runsheetid, shipmentid))",\n "create table shipment_address (shipmentid bigint(20) not null, addressid bigint(20) not null, primary key(shipmentid, addressid))",\n "create table dummy (shipmentid varchar(20), trackingid varchar(15) default null, createdatetime datetime default current_timestamp, primary key (shipmentid))",\n "create table shipment_status(shipmentid bigint(20) not null, updatedatetime datetime default current_timestamp, status tinyint(1))"\n ],\n "ExecutorErr": "Schema change: 'drop table if exists shipment' does not introduce any table definition change.",\n "TotalTimeSpent": 45414411\n}\n"

Ideally it should gracefully handle the drop table if not exists and create the table.

@ashudeep-sharma-zz
Copy link
Contributor Author

I verified that this has some problems when drop and creation of the table happens in the same script, if i try to run a script with the following commands:
drop table if exists shipment;
drop table if exists runsheet;
drop table if exists runsheet_shipments;
drop table if exists shipment_address;
drop table if exists shipment_status;
drop table if exists dummy;

It works great.

This seems to be a valid script:
drop table if exists shipment;
create table shipment (shipmentid bigint(20) not null, trackingid varchar(15) default null, createdatetime datetime default current_timestamp,isactive tinyint(1) not null default 0, primary key (shipmentid));
drop table if exists runsheet;
create table runsheet (runsheetid bigint(20) not null, createdatetime datetime default current_timestamp, isactive tinyint(1) not null default 1, deliverydate datetime default null, primary key (runsheetid));
drop table if exists runsheet_shipments;
create table runsheet_shipments(runsheetid bigint(20), shipmentid bigint(20), primary key(runsheetid, shipmentid));
drop table if exists shipment_address;
create table shipment_address (shipmentid bigint(20) not null, addressid bigint(20) not null, primary key(shipmentid, addressid));
drop table if exists dummy;
create table dummy (shipmentid varchar(20), trackingid varchar(15) default null, createdatetime datetime default current_timestamp, primary key (shipmentid));
drop table if exists shipment_status;
create table shipment_status(shipmentid bigint(20) not null, updatedatetime datetime default current_timestamp, status tinyint(1));

but couldn;t be executed using ApplySchema, it gives the following exception:

E0526 09:36:31.120971 13538 main.go:43] Remote error: rpc error: code = 13 desc = invalid header field value "Schema change failed, ExecuteResult: {\n "FailedShards": null,\n "SuccessShards": null,\n "CurSQLIndex": 0,\n "Sqls": [\n "drop table if exists shipment",\n "create table shipment (shipmentid bigint(20) not null, trackingid varchar(15) default null, createdatetime datetime default current_timestamp,isactive tinyint(1) not null default 0, primary key (shipmentid))",\n "drop table if exists runsheet",\n "create table runsheet (runsheetid bigint(20) not null, createdatetime datetime default current_timestamp, isactive tinyint(1) not null default 1, deliverydate datetime default null, primary key (runsheetid))",\n "drop table if exists runsheet_shipments",\n "create table runsheet_shipments(runsheetid bigint(20), shipmentid bigint(20), primary key(runsheetid, shipmentid))",\n "drop table if exists shipment_address",\n "create table shipment_address (shipmentid bigint(20) not null, addressid bigint(20) not null, primary key(shipmentid, addressid))",\n "drop table if exists dummy",\n "create table dummy (shipmentid varchar(20), trackingid varchar(15) default null, createdatetime datetime default current_timestamp, primary key (shipmentid))",\n "drop table if exists shipment_status",\n "create table shipment_status(shipmentid bigint(20) not null, updatedatetime datetime default current_timestamp, status tinyint(1))"\n ],\n "ExecutorErr": "rpc error: code = 13 desc = invalid header field value \"TabletManager.PreflightSchema on test-0000000200 error: /usr/bin/mysql: ERROR 1050 (42S01) at line 3: Table 'shipment' already exists\n\"",\n "TotalTimeSpent": 206411085\n}\n"

@michael-berlin
Copy link
Contributor

You're seeing the error:

Schema change: 'drop table if exists shipment' does not introduce any table definition change.

It comes from here:

https://github.com/youtube/vitess/blob/master/go/vt/schemamanager/tablet_executor.go#L170

ApplySchema runs every query on a temporary database first (preflight mode) and checks if it actually results into a change of the schema. Since the table didn't exist in the first place, the "drop table if exists" query didn't have any impact.

The solution is to exclude such queries from the check in tablet_executor.go. An ideal solution would use the SQL parser to check if it's a "drop table if exists" statement. As a quick hack, you could also do a string comparison to exclude it.

I won't have time to work on this in the next days. Let me know if you're interested on fixing it.

@michael-berlin michael-berlin changed the title ApplySchema currently fails for some sql commands ApplySchema fails on DDLs which don't result in a schema change (e.g. "drop table if exists" on a non-existing table) May 26, 2016
@ashudeep-sharma-zz
Copy link
Contributor Author

Sure, I can take it up, will get into the details and come back in case i have any concerns.

@ashudeep-sharma-zz
Copy link
Contributor Author

I think there is more to this then if the schema doesn;t introduce any table definition change.
I considered a second scenario in which my tables actually exists in the shards and then execute these queries, which introduce a table definition change in every statement.
drop table if exists shipment;
create table shipment (shipmentid bigint(20) not null, trackingid varchar(15) default null, createdatetime datetime default current_timestamp,isactive tinyint(1) not null default 0, primary key (shipmentid));
drop table if exists runsheet;
create table runsheet (runsheetid bigint(20) not null, createdatetime datetime default current_timestamp, isactive tinyint(1) not null default 1, deliverydate datetime default null, primary key (runsheetid));
drop table if exists runsheet_shipments;
create table runsheet_shipments(runsheetid bigint(20), shipmentid bigint(20), primary key(runsheetid, shipmentid));
drop table if exists shipment_address;
create table shipment_address (shipmentid bigint(20) not null, addressid bigint(20) not null, primary key(shipmentid, addressid));
drop table if exists dummy;
create table dummy (shipmentid varchar(20), trackingid varchar(15) default null, createdatetime datetime default current_timestamp, primary key (shipmentid));
drop table if exists shipment_status;
create table shipment_status(shipmentid bigint(20) not null, updatedatetime datetime default current_timestamp, status tinyint(1));

"ExecutorErr": "rpc error: code = 13 desc = invalid header field value \"TabletManager.PreflightSchema on test-0000000200 error: /usr/bin/mysql: ERROR 1050 (42S01) at line 3: Table 'shipment' already exists

Just want to checkout are these queries executed concurrently or serially?

@ashudeep-sharma-zz
Copy link
Contributor Author

Attaching the Vitess Components logs for the reference, this time i tried executing the following queries:
drop table if exists shipment;
drop table if exists runsheet;
drop table if exists runsheet_shipments;
drop table if exists dummy;
drop table if exists shipment_status;
drop table if exists shipment_address;
create table shipment (shipmentid bigint(20) not null, trackingid varchar(15) default null, createdatetime datetime default current_timestamp,isactive tinyint(1) not null default 0, primary key (shipmentid));
create table runsheet (runsheetid bigint(20) not null, createdatetime datetime default current_timestamp, isactive tinyint(1) not null default 1, deliverydate datetime default null, primary key (runsheetid));
create table runsheet_shipments(runsheetid bigint(20), shipmentid bigint(20), primary key(runsheetid, shipmentid));
create table shipment_address (shipmentid bigint(20) not null, addressid bigint(20) not null, primary key(shipmentid, addressid));
create table dummy (shipmentid varchar(20), trackingid varchar(15) default null, createdatetime datetime default current_timestamp, primary key (shipmentid));
create table shipment_status(shipmentid bigint(20) not null, updatedatetime datetime default current_timestamp, status tinyint(1));

Also validated that before executing these queries all the 6 tables were existing in both the shards.
VitessLogs.txt

@ashudeep-sharma-zz
Copy link
Contributor Author

ashudeep-sharma-zz commented Jun 1, 2016

@michael-berlin @sougou
On checking the code flow, i found that each sql statement from the script specified in the ApplySchema command are applied individually to the BaseSchema and not sequentially in the way its being specified in the script. Is there any reason for applying the statements one at a time only to the BaseSchema rather than applying it sequentially on the Base Schema?

I am referring to this code in tablet_executor.go
func (exec *TabletExecutor) preflightSchemaChanges(ctx context.Context, sqls []string) error { exec.schemaDiffs = make([]*tmutils.SchemaChangeResult, len(sqls)) **_for i := range sqls { schemaDiff, err := exec.tmClient.PreflightSchema( ctx, exec.tablets[0], sqls[i]) if err != nil { fmt.Println(err) return err }_** exec.schemaDiffs[i] = schemaDiff diffs := tmutils.DiffSchemaToArray( "BeforeSchema", exec.schemaDiffs[i].BeforeSchema, "AfterSchema", exec.schemaDiffs[i].AfterSchema) if len(diffs) == 0 { return fmt.Errorf("Schema change: '%s' does not introduce any table definition change.", sqls[i]) } } return nil }

@ashudeep-sharma-zz
Copy link
Contributor Author

@michael-berlin @sougou
So as a part of this fix, i think that all the statements given in the script should be combined and executed sequentially in one go. Also there should be some check for DDLS like drop etc which doesn;t introduce any schema change.
Not sure why its an error if the DDL doesn;t introduce any schema change, shouldn;t it be a warning with some proper message.
Let me know your thoughts and concerns on this.

@michael-berlin
Copy link
Contributor

So as a part of this fix, i think that all the statements given in the script should be combined and executed sequentially in one go.

I agree as far as the preflight is concerned. I'm working on this right now and will publish a pull request for it later.

@michael-berlin
Copy link
Contributor

me wrote:

I agree as far as the preflight is concerned. I'm working on this right now and will publish a pull request for it later.

Fix and e2e test here: #1749

michael-berlin added a commit to michael-berlin/vitess that referenced this issue Jun 1, 2016
…es instead of a single change.

This makes it easier to test multiple changes which depend on each other e.g. a CREATE TABLE after a DROP TABLE if the table already exists.

Fixes issue vitessio#1731 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants