Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning is not deterministic #21637

Closed
findepi opened this issue Apr 19, 2024 · 3 comments · Fixed by #21638
Closed

Planning is not deterministic #21637

findepi opened this issue Apr 19, 2024 · 3 comments · Fixed by #21638
Labels
bug Something isn't working

Comments

@findepi
Copy link
Member

findepi commented Apr 19, 2024

This can be observed with SQL Server connector:

// Add this to TestSqlServerConnectorTest.java
@Test
public void testSqlServerJoinPushdown()
{
    Session session = joinPushdownEnabled(getSession());

    Runnable assertJoinWithDistinct = () ->
            assertThat(query(session, "SELECT n.name, c.name FROM nation n JOIN customer c ON n.nationkey = c.nationkey AND n.regionkey IS DISTINCT FROM c.custkey"))
                    .joinIsNotFullyPushedDown();

    for (int i = 0; i < 100; i++) {
        // assertJoinWithDistinct passes reliably
        assertJoinWithDistinct.run();
    }

    assertThat(query(session, "SELECT n.name, c.name FROM nation n JOIN customer c ON n.nationkey = c.nationkey and n.regionkey = c.custkey"))
            // .skipResultsCorrectnessCheckForPushdown() // with this uncommented, the assertion below would pass
            .isFullyPushedDown();

    // assertJoinWithDistinct does not pass anymore, the join gets pushed down (without IS DISTINCT FROM predicate)
    assertJoinWithDistinct.run();
}

Note: i am not aware of any caching done by the connector, but the plans are sensitive to stats returned by the connector, and SQL Server's state may be changing when we issue queries.

@findepi findepi added bug Something isn't working RELEASE-BLOCKER and removed RELEASE-BLOCKER labels Apr 19, 2024
@findepi
Copy link
Member Author

findepi commented Apr 19, 2024

marked release blocker. not because join pushdown in SQL server is so important (it is not), but because of the engine side of the problem.

the planner apparently puts

n.nationkey = c.nationkey AND n.regionkey IS DISTINCT FROM c.custkey

as a join condition.
but then, sometimes after ReorderJoin, the IS DISTINCT FROM moves to FilterNode above the JoinNode.
the plan is equivalent, but

  • some pushdowns may start or stop to be applied
  • when pushdown doesn't happen, the join operator will produce more data, unless we put the filter back into JoinNode.

@martint
Copy link
Member

martint commented Apr 19, 2024

The non-determinism is because the SQL server connector is returning stats non-deterministically and that the connector is handling pushdown inconsistently for filter(join(r, l), <f>) vs join(r, l, <f>).

If I recall correctly, for the first execution, the connector doesn't return column stats. For the second one, it does. This results in a different join order and slightly different query plan that the connector refuses to push down.

@findepi
Copy link
Member Author

findepi commented Apr 19, 2024

Yes, the plan are different.
When SQL Server connector is offered a join pushdown with n.nationkey = c.nationkey AND n.regionkey IS DISTINCT FROM c.custkey condition, it rejects the offer (it does not support IS DISTINCT FROM predicates).

When ReorderJoins kicks in, the join condition becomes n.nationkey = c.nationkey and the resst (n.regionkey IS DISTINCT FROM c.custkey) becomes a filter above. When SQL Server connector is offered a join pushdown with n.nationkey = c.nationkey, it can accept it (cost-based, or eaerly depending on configuration).

Both plans are obviously equivalent (produce same relations) and correct. The fact that engine sometimes presents more and sometimes fewer join conditions to the connector is something undesirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants