Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite joins written with the USING construct #6660

Merged
merged 10 commits into from Dec 8, 2020

Conversation

systay
Copy link
Collaborator

@systay systay commented Sep 2, 2020

If a join has been expressed as:

FROM tblA JOIN tblB USING (col1,col2)

it will be rewritten to

FROM tblA JOIN tblB ON tblA.col1 = tblB.col1 AND tblA.col2 = tblB.col2

This allows our planner to recognize these queries and plan them correctly.

If a join has been expressed as:

```
FROM tblA JOIN tblB USING (col1,col2)
```

it will be rewritten to

```
FROM tblA JOIN tblB ON tblA.col1 = tblB.col1 AND tblA.col2 = tblB.col2
```

This allows our planner to recognize these queries and plan them correctly.

Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay systay requested a review from sougou as a code owner September 2, 2020 08:20
@derekperkins
Copy link
Member

Will this collapse the USING columns on the way out? AFAIK, that's the only functional difference between these two joins.

@sougou
Copy link
Contributor

sougou commented Sep 2, 2020

I vaguely remember that there was a specific corner case where this approach wouldn't work. I'll need to dig into some more info to find out why.

@sougou
Copy link
Contributor

sougou commented Sep 2, 2020

Ok. I remember now. It becomes ambiguous in the case of multi-table joins (if we don't know the schema). We'll need to lookup the documentation.

Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay
Copy link
Collaborator Author

systay commented Sep 3, 2020

Ok. I remember now. It becomes ambiguous in the case of multi-table joins (if we don't know the schema)

The differences are all around which columns are available in the result set.

Here are two examples that show differences that we would need to handle:

> select a from t as t1 join t as t2 on t1.a = t2.a limit 1;
(1052, "Column 'a' in field list is ambiguous")

> select a from t as t1 join t as t2 using (a) limit 1;
+---+
| a |
+---+
| 0 |
+---+
1 row in set
Time: 0.016s
> select * from t as t1 join t as t2 using (a) limit 1;
+---+
| a |
+---+
| 0 |
+---+
1 row in set
Time: 0.013s
> select * from t as t1 join t as t2 on t1.a = t2.a limit 1;
+---+---+
| a | a |
+---+---+
| 0 | 0 |
+---+---+
1 row in set
Time: 0.008s

TL;DR; Changing the JOIN also changes the output, which I did not expect. This rewrite is not as easy as I first thought. Backing off for now. :(

@systay systay closed this Sep 3, 2020
@GuptaManan100 GuptaManan100 reopened this Dec 8, 2020
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
go/vt/vtgate/planbuilder/builder.go Outdated Show resolved Hide resolved
Comment on lines +2082 to +2084
# join with USING construct
"select user.id from user join user_extra using(id)"
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for non-vindex column in projection to see if that works

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not touching this logic at all. all we are doing is rewriting USING to ON in the rewriter

Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
@GuptaManan100
Copy link
Member

Fixes #7119

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
@systay systay merged commit fc0fdeb into vitessio:master Dec 8, 2020
@systay systay deleted the join-using branch December 8, 2020 13:42
@systay
Copy link
Collaborator Author

systay commented Dec 8, 2020

Very limited support for using JOIN ... USING.

Limitations:

  • SELECT * is already not supported in for scatter joins, but double not supported for this form of join syntax.
  • No more than two tables can be joined this way.

Some frameworks seem to need at least this form (#7119), and so this PR seemed like an acceptable path forward in lieu of more substantial reworking of the join planning.

@askdba askdba added this to the v9.0 milestone Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rails migrations fail with error "vtgate: two predicates for table_name not supported"
6 participants