Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass relation statistics to applyJoin in ConnectorMetadata #7000

Merged

Conversation

losipiuk
Copy link
Member

No description provided.

*
* As soon as we have cost based logic to determine join pushdown feasibility we
* will change the default to automatic.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK for now, but we will actually want to enable it by default on the engine side, and then disable by default on the connector side.

Thus, the engine side should be simplified to be true/false toggle, and true by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see it's being removed in next commit. Thanks for having this in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With new approach we should probably drop JoinPushdownMode altogether and just have enable-join-pushdown config property (and matching session property).
WDYT @findepi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and BTW this is what i was trying to express. You just did it way better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are very kind :)

*
* As soon as we have cost based logic to determine join pushdown feasibility we
* will change the default to automatic.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I should have left the initial part of the comment I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this PR -- sure. But ultimately we want to enable this by default in engine, as we transfer the responsibility onto connectors.

@@ -128,7 +134,8 @@ public Result apply(JoinNode joinNode, Captures captures, Context context)
filterSplitResult.getPushableConditions(),
// TODO we could pass only subset of assignments here, those which are needed to resolve filterSplitResult.getPushableConditions
leftAssignments,
rightAssignments);
rightAssignments,
getJoinStatistics(joinNode, left, right, context));
Copy link
Member

@findepi findepi Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here or at getJoinStatistics() definition, include the comment why we're doing this, as this is not obvious. One could think connector could determine this on its own, at least for table scans.

The reasons are:

  • the engine knows how to estimate join and connector may not
  • the engine may have cached stats for the table scans (within context.getStatsProvider()), so can be able to provide information more inexpensively
  • in the future, the engine can be able to provide stats for table scan even in case when connector no longer can, linking to Remember best known statistics for TableHandle on engine side #6998
  • the logic can be different (configured differently) for separate catalogs or separate connectors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@losipiuk edited, added fourth bullet)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the comment be rather in SPI method javadoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I put it in rule for now)

@martint
Copy link
Member

martint commented Feb 23, 2021

Why would the connector be able to make a decision based on the stats for left/right/join that the engine cannot do?

@losipiuk
Copy link
Member Author

losipiuk commented Feb 23, 2021

Why would the connector be able to make a decision based on the stats for left/right/join that the engine cannot do?

@martint The thinking here is that you need to have deep understanding of underlying data source to be able to perform informed decision.
An example question to be answered would be "do tables we are trying to join have indexes on join column?", engine does not have a clue.

We would also like to be able to configure whether join is to happen EAGERLY or cost based on per-connector basis, and to have that ultimately connector need to decide.

@findepi
Copy link
Member

findepi commented Feb 23, 2021

@martint see also #7000 (comment).
The reasoning is important and will be captured in a code comment.

@losipiuk losipiuk force-pushed the lo/oportunistic-join-pushdown-cost-based-2 branch from b66310d to 0abd250 Compare February 23, 2021 19:33
Pass left, right and join statistics to connector via applyJoin, to
allow connector make informed decision if it makes sense to perform join
pushdown. Given how heterogenous connectors are, and how different
internal capabilities each one have, we cannot make reasonable decision
on the engine side.
@losipiuk losipiuk force-pushed the lo/oportunistic-join-pushdown-cost-based-2 branch from 0abd250 to c01fb5d Compare February 23, 2021 21:18
@losipiuk
Copy link
Member Author

AC

@losipiuk losipiuk force-pushed the lo/oportunistic-join-pushdown-cost-based-2 branch from c01fb5d to 262615a Compare February 24, 2021 12:22
We are moving decision process if we should pursue with join pushdown to
connectors. On the engine side we are leaving global switch to disable
all pushdown altogether. This is inline with other types of pushdowns
which do not have specific switches on the engine side.
@losipiuk losipiuk force-pushed the lo/oportunistic-join-pushdown-cost-based-2 branch from 262615a to 3b2d2e8 Compare February 24, 2021 13:48
@losipiuk losipiuk merged commit a338de5 into trinodb:master Feb 24, 2021
@losipiuk losipiuk mentioned this pull request Mar 2, 2021
10 tasks
@losipiuk losipiuk added this to the 353 milestone Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants