Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional tests for outer join push downs #14841

Merged
merged 3 commits into from
Nov 18, 2022

Conversation

vlad-lyutenko
Copy link
Contributor

@vlad-lyutenko vlad-lyutenko commented Oct 31, 2022

Description

In Implement Join pushdown for JDBC connectors we implemented join pushdown for JDBC connectors.

As part of that a test BaseJdbcConnectorTest#testJoinPushdown was added which verifies that LEFT, RIGHT and FULL joins get pushed down with all possible operators (=, !=, <, <=, >, >=, IS DISTINCT FROM, IS NOT DISTINCT FROM).

This PR is initial attempt to add more test cases for outer join pushdowns

Non-technical explanation

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 31, 2022
hasBehavior(SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_EQUALITY),
joinOverTableScans);

// multiple bigint predicates
assertThat(query(session, "SELECT n.name, c.name FROM nation n JOIN customer c ON n.nationkey = c.nationkey and n.regionkey = c.custkey"))
assertThat(query(session, format("SELECT n.name, c.name FROM nation n %s customer c ON n.nationkey = c.nationkey and n.regionkey = c.custkey", joinOperator)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next part // inequality - is tricky one, not sure why but queries with most of inequality operators start to be fully pushdown with OUTER joins (working on this to understand why)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is also added now, the problem was that we treat INNER join in this case as CROSS JOIN and disable push down for such cases:

public class PushJoinIntoTableScan
        implements Rule<JoinNode>
    @Override
    public Result apply(JoinNode joinNode, Captures captures, Context context)
    {
        if (joinNode.isCrossJoin()) {
            return Result.empty();
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem was that we treat INNER join in this case as CROSS JOIN

Join with no equi conditions gets planned as

FilterNode
- CrossJoin
  - Source A
  - Source B

and disable push down for such cases:

a safety measure

But we need to match the plan patterns like above and run Join pushdown for these as well.
I guiess @wendigo may be working on this. I remember explaining this to him.

@vlad-lyutenko
Copy link
Contributor Author

Looks like Postgres doesn't support FULL OUTER JOINs with inequality operators:

io.trino.spi.TrinoException: ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
   at io.trino.plugin.jdbc.JdbcRecordCursor.handleSqlException(JdbcRecordCursor.java:305)
   at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:179)
   at io.trino.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:88)
   at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:311)
   at io.trino.operator.Driver.processInternal(Driver.java:411)
   at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
   at io.trino.operator.Driver.tryWithLock(Driver.java:706)
   at io.trino.operator.Driver.process(Driver.java:306)
   at io.trino.operator.Driver.processForDuration(Driver.java:277)
   at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:739)
   at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
   at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
   at io.trino.$gen.Trino_testversion____20221103_121138_1.run(Unknown Source)
   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLException: ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
   at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:159)
   ... 14 more
Caused by: java.util.concurrent.ExecutionException: org.postgresql.util.PSQLException: ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
   at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:154)
   ... 14 more

So I disabled this type of JOINs for postgres client, not sure it should be in this PR or separate, and maybe we could disable it in more smart way like not all OUTER joins but only for inequality operators

cc @findepi @hashhar

@@ -970,6 +970,10 @@ public Optional<PreparedQuery> implementJoin(
Map<JdbcColumnHandle, String> leftAssignments,
JoinStatistics statistics)
{
if (joinType == JoinType.FULL_OUTER) {
// FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this suggests in future we should look into making available the joinType to isSupportedJoinCondition so that we can add logic to check that for FULL_OUTER the joinCondition is as Postgres supports and allow pushdown in those cases.

(No change requested but maybe we create an issue about possible future enhancement).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue added #14929

hasBehavior(SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_EQUALITY),
joinOverTableScans);

// multiple bigint predicates
assertThat(query(session, "SELECT n.name, c.name FROM nation n JOIN customer c ON n.nationkey = c.nationkey and n.regionkey = c.custkey"))
assertThat(query(session, format("SELECT n.name, c.name FROM nation n %s customer c ON n.nationkey = c.nationkey and n.regionkey = c.custkey", joinOperator)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem was that we treat INNER join in this case as CROSS JOIN

Join with no equi conditions gets planned as

FilterNode
- CrossJoin
  - Source A
  - Source B

and disable push down for such cases:

a safety measure

But we need to match the plan patterns like above and run Join pushdown for these as well.
I guiess @wendigo may be working on this. I remember explaining this to him.

@DataProvider
public Object[][] joinOperators()
{
if (hasBehavior(SUPPORTS_JOIN_PUSHDOWN_WITH_FULL_JOIN)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when ! has behavior SUPPORTS_JOIN_PUSHDOWN_WITH_FULL_JOIN, it should be verified that FULL JOIN pushdown isn't supported.
otherwise the connectors' declarations won't be tested for truthfulness (and will be wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -970,6 +970,10 @@ public Optional<PreparedQuery> implementJoin(
Map<JdbcColumnHandle, String> leftAssignments,
JoinStatistics statistics)
{
if (joinType == JoinType.FULL_OUTER) {
// FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
return Optional.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like disabling a lot of functionality. However, "merge- or hash- joinable" conditions sounds like "equality and inequality", so all the comparison expressions?
Can we support outer join with some conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can have FULL OUTER JOIN only for equality =,
for all others like <, <=, <>, DISTINCT we got exception,
I created issue for this - #14929

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments.


// join over join
assertThat(query(session, "SELECT * FROM nation n, region r, customer c WHERE n.regionkey = r.regionkey AND r.regionkey = c.custkey"))
.isFullyPushedDown();
}
}

@DataProvider
public Object[][] joinOperators()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operators is the "execution" term while "type" is the SQL term maybe? Would joinTypes be better? (especially since there's no seaprate join operator for each of these "operators").

(No change requested, just seeking opinion from others).

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/outer-join-tests branch 3 times, most recently from 7769acc to 33e66f3 Compare November 15, 2022 11:37
@DataProvider
public Object[][] joinOperators()
{
return new Object[][] {{JOIN}, {LEFT_JOIN}, {RIGHT_JOIN}, {FULL_JOIN}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all values from JoinOperator are enlisted here, so JoinOperator.values() + DataProviders#toDataProvider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +37 to +39
public String toString()
{
return value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick

Comment on lines +16 to +21
public enum JoinOperator
{
JOIN("JOIN"),
LEFT_JOIN("LEFT JOIN"),
RIGHT_JOIN("RIGHT JOIN"),
FULL_JOIN("FULL JOIN"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it 1-to-1 mapping with io.trino.spi.connector.JoinType ?
have you considered to reuse JoinType?
have you considered to verify that JoinOperator contains all values of JoinType?
Do you anticipate other values can be present here in the future?

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this operator it's more about how JOIN condition is present in actual String query,
for example in future or for some connectors (in case of some bugs) we could add/write LEFT OUTER JOIN instead/additionally to LEFT JOIN. (however this is the same type of join)
So I'd prefer to keep these things separately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to tackle already defined enums and their string representation elegantly.
And this does not seem to change frequently or any time soon.
so may be at least some verify (JoinOperator.values.size() vs JoinType.values().size())?

or for some connectors (in case of some bugs) we could add/write LEFT OUTER JOIN instead/additionally to LEFT JOIN.

don't see how to do that easily with current implementation, but I think it does not matter here/now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see how to do that easily with current implementation, but I think it does not matter here/now.

you can just add to enum :
...
LEFT_OUTER_JOIN("LEFT OUTER JOIN"),
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so may be at least some verify (JoinOperator.values.size() vs JoinType.values().size())?

For me it's not mapping to actual joins, maybe another naming will help - like JoinOperatorStringRepresentation, don't know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to decouple the following three things:

  1. Join types on the plan level (represented by JoinNode.Type)
  2. Join types on SPI level (represented by JoinType)
  3. Mapping SPI join types to SQL strings (represented by JoinOperators here).

There's no reason for 1:1 mapping between 1 and 2.
There also no reason for 1:1 mapping between 2 and 3. e.g. LEFT OUTER can appear in SQL text as LEFT or LEFT OUTER - both are same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to verify JoinType values are subset of JoinOperator but it sounds premature - it's only a problem when someone implements a new Join node, adds plan optimizer to push down to table scan, implements in some connector - all of this without adding tests.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/outer-join-tests branch 2 times, most recently from 54a94a5 to db7755e Compare November 16, 2022 12:17
Copy link
Contributor

@ssheikin ssheikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

This is easier to use than using isNotFullyPushedDown in cases when you
only need to verify some PlanNode still exists and was not consumed by
the connector instead of verifying exact shape of some sub-plan.
This uncovers a bug in Postgres connector that pushing down FULL OUTER join
queries with inequality join conditions fails with an error like "FULL JOIN is
only supported with merge-joinable or hash-joinable join conditions".

So FULL OUTER join pushdown is disabled for Postgres connector at the moment.
assertJoinConditionallyPushedDown is simpler to use and more generic as
it doesn't get affected by exact plan shape.
@hashhar hashhar force-pushed the vlad-lyutenko/outer-join-tests branch from db7755e to fcee24d Compare November 16, 2022 19:05
@hashhar
Copy link
Member

hashhar commented Nov 16, 2022

just reworded + rebased (since a new release was done) to avoid logical conflict if they exist

@vlad-lyutenko
Copy link
Contributor Author

let's wait with merge

@vlad-lyutenko
Copy link
Contributor Author

I think we are ok now to move forward

@hashhar hashhar merged commit 588b9f5 into trinodb:master Nov 18, 2022
@hashhar hashhar mentioned this pull request Nov 18, 2022
@github-actions github-actions bot added this to the 404 milestone Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants