Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip database qualifier from VIEW definitions in GetSchema #9655

Closed
wants to merge 2 commits into from

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Feb 9, 2022

Description

The TabletManager GetSchema RPC call results for SQL Views have the database name replaced by the template notation {{.DatabaseName}} when what we should do instead is remove the database name qualifier so that CREATE VIEW clauses match CREATE TABLE clauses in the results — meaning that they do not have the database name qualifier.

The difference in the base output coming from MySQL is due to the difference between what's shown in the SHOW CREATE VIEW output and what's in information_schema.views.

You can see a demonstration here...

Here's the basic structures created and viewed from a vtgate:

mysql> create table t1 (id int)
Query OK, 0 rows affected (0.01 sec)

mysql> create view v1 as select id from t1;
Query OK, 0 rows affected (0.01 sec)

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> show create view v1\G
*************************** 1. row ***************************
                View: v1
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `t1`.`id` AS `id` from `t1`
character_set_client: utf8mb4
collation_connection: utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> select * from information_schema.views where table_name="v1"\G
*************************** 1. row ***************************
       TABLE_CATALOG: def
        TABLE_SCHEMA: vt_customer
          TABLE_NAME: v1
     VIEW_DEFINITION: select `vt_customer`.`t1`.`id` AS `id` from `vt_customer`.`t1`
        CHECK_OPTION: NONE
        IS_UPDATABLE: YES
             DEFINER: vt_app@localhost
       SECURITY_TYPE: DEFINER
CHARACTER_SET_CLIENT: utf8
COLLATION_CONNECTION: utf8_general_ci
1 row in set (0.00 sec)

Here's what we see in the Vitess schema:

Before:

$ vtctlclient GetSchema -include-views=true -tables "t1,v1" zone1-200 | jq -r '.table_definitions[].schema'
CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE ALGORITHM=UNDEFINED DEFINER=`vt_app`@`localhost` SQL SECURITY DEFINER VIEW {{.DatabaseName}}.`v1` AS select {{.DatabaseName}}.`t1`.`id` AS `id` from {{.DatabaseName}}.`t1`

After:

$ vtctlclient GetSchema -include-views=true -tables "t1,v1" zone1-200 | jq -r '.table_definitions[].schema'
CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE ALGORITHM=UNDEFINED DEFINER=`vt_app`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `t1`.`id` AS `id` from `t1`

Related Issue(s)

Checklist

  • Should this PR be backported? NO
  • Test was added
  • Documentation is not required

Signed-off-by: Matt Lord <mattalord@gmail.com>
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see inline notes

// so that the results for Views match:
// 1. The SHOW CREATE VIEW output from MySQL
// 2. The CREATE TABLE output in these results, which does not have the DB name qualifier
norm = strings.Replace(norm, backtickDBName+".", "", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of strings.Replace() as means to manipulate queries. What happens if there's a column that has same name as the database? At the very least I'd say lets replace -1 with 1:

Suggested change
norm = strings.Replace(norm, backtickDBName+".", "", -1)
norm = strings.Replace(norm, backtickDBName+".", "", 1)

but then, when I experiment on my local machine:

master [localhost:21324] {msandbox} (test) > show create table `test`.`v` \G
*************************** 1. row ***************************
                View: v
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`msandbox`@`localhost` SQL SECURITY DEFINER VIEW `v` AS select `t`.`id` AS `id`,`t`.`i` AS `i`,`t`.`tx` AS `tx` from `t`
character_set_client: utf8mb4
collation_connection: utf8mb4_0900_ai_ci

I don't see a database qualifier in the resulting CREATE VIEW statement. therefore, should there be a column named test, it will be wrongly replaced.

I realize the bug predates this PR. But then, I also see no testing to this logic. Should we seize the opportunity to fix this? We can e.g. sqlparser.ParseStatement into a CreateView, then remove the schema qualifier, then export back via sqlparser.String().

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about the vtctl GetSchema results and the RPC it uses. You can see an example if you exapand Here's a demonstration: in the description.

I had similar concerns about the potential for problems, but since we already had that issue in this code I set it aside. I can explore alternatives though, I'm going to move this back to draft anyway as I'm adding a new e2e test for GetSchema in vitess.io/vitess/go/test/endtoend/tabletmanager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. I actually missed the Here's a demonstration: section.

I agree that the solution in this PR is better than already existing code.

@mattlord mattlord marked this pull request as draft February 9, 2022 07:59
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord
Copy link
Contributor Author

mattlord commented Feb 9, 2022

@rohit-nayak-ps, @deepthi, and @shlomi-noach so I don't think we want to go this route after sleeping on it. The reasons being:

  1. Views and table objects exist in a single schema
  2. BUT, views can query tables across schemas. This is why the internal/I_S dictionary information has the fully qualified identifiers (table and column names) in the view definition
  3. VReplication uses this same GetSchema TabletManager RPC via CopySchemaShardFromShard which calls CopySchemaShard and then SchemaDefinitionToSQLStrings for MoveTables operations. The {{.DatabaseName}} template token is then replaced using go templating to convert the source database name to the target database name in applySQLShard.

So if we wanted to ignore/remove the token output from this RPC, I think it should be on the caller side. Any thoughts?

@mattlord
Copy link
Contributor Author

mattlord commented Feb 9, 2022

Closing this as it's not 100% safe to do within Vitess and it will instead be done at higher layers on the caller side when it's known to be safe.

I moved the new test to its own PR: #9658

@mattlord mattlord closed this Feb 9, 2022
@mattlord mattlord deleted the create_view_output branch February 9, 2022 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants