New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass relation statistics to applyJoin in ConnectorMetadata #7000
Pass relation statistics to applyJoin in ConnectorMetadata #7000
Conversation
* | ||
* As soon as we have cost based logic to determine join pushdown feasibility we | ||
* will change the default to automatic. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK for now, but we will actually want to enable it by default on the engine side, and then disable by default on the connector side.
Thus, the engine side should be simplified to be true/false toggle, and true by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see it's being removed in next commit. Thanks for having this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With new approach we should probably drop JoinPushdownMode
altogether and just have enable-join-pushdown
config property (and matching session property).
WDYT @findepi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and BTW this is what i was trying to express. You just did it way better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are very kind :)
* | ||
* As soon as we have cost based logic to determine join pushdown feasibility we | ||
* will change the default to automatic. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I should have left the initial part of the comment I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this PR -- sure. But ultimately we want to enable this by default in engine, as we transfer the responsibility onto connectors.
core/trino-spi/src/main/java/io/trino/spi/connector/BasicRelationStatistics.java
Outdated
Show resolved
Hide resolved
@@ -128,7 +134,8 @@ public Result apply(JoinNode joinNode, Captures captures, Context context) | |||
filterSplitResult.getPushableConditions(), | |||
// TODO we could pass only subset of assignments here, those which are needed to resolve filterSplitResult.getPushableConditions | |||
leftAssignments, | |||
rightAssignments); | |||
rightAssignments, | |||
getJoinStatistics(joinNode, left, right, context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here or at getJoinStatistics() definition, include the comment why we're doing this, as this is not obvious. One could think connector could determine this on its own, at least for table scans.
The reasons are:
- the engine knows how to estimate join and connector may not
- the engine may have cached stats for the table scans (within
context.getStatsProvider()
), so can be able to provide information more inexpensively - in the future, the engine can be able to provide stats for table scan even in case when connector no longer can, linking to Remember best known statistics for TableHandle on engine side #6998
- the logic can be different (configured differently) for separate catalogs or separate connectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@losipiuk edited, added fourth bullet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the comment be rather in SPI method javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I put it in rule for now)
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
Why would the connector be able to make a decision based on the stats for left/right/join that the engine cannot do? |
@martint The thinking here is that you need to have deep understanding of underlying data source to be able to perform informed decision. We would also like to be able to configure whether join is to happen EAGERLY or cost based on per-connector basis, and to have that ultimately connector need to decide. |
@martint see also #7000 (comment). |
b66310d
to
0abd250
Compare
Pass left, right and join statistics to connector via applyJoin, to allow connector make informed decision if it makes sense to perform join pushdown. Given how heterogenous connectors are, and how different internal capabilities each one have, we cannot make reasonable decision on the engine side.
0abd250
to
c01fb5d
Compare
AC |
core/trino-main/src/main/java/io/trino/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
c01fb5d
to
262615a
Compare
We are moving decision process if we should pursue with join pushdown to connectors. On the engine side we are leaving global switch to disable all pushdown altogether. This is inline with other types of pushdowns which do not have specific switches on the engine side.
262615a
to
3b2d2e8
Compare
No description provided.