Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPI and optimizer rule for connectors that can support complete topN … #4249

Merged
merged 1 commit into from Aug 5, 2020

Conversation

Parth-Brahmbhatt
Copy link
Member

…pushdown

@cla-bot cla-bot bot added the cla-signed label Jun 26, 2020
@martint martint self-requested a review June 26, 2020 21:38
@Parth-Brahmbhatt
Copy link
Member Author

I have made the assumption that a connector that states it can support complete topN will guarantee N or less result, if we think that is too strict a requirement we can do something similar to limit pushdown.


import static java.util.Objects.requireNonNull;

public class TopN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a dedicated class for this seems overkill. Every place that uses this has immediate access to the count and SortItems from a TopNNode. The SPI method can just take both fields separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the class and added topNCount and sortItems as just arguments.

* If the connector can handle TopN Pushdown it should return a new table handle which will replace the existing
* table handle in the TableScan.
*/
default Optional<ConnectorTableHandle> applyTopN(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to return a structure similar to applyLimit, where it indicates whether the limit is guaranteed by the connector. In the case of a distributed connector implementation, it may not be able to perform a global top N, but it could still benefit from pushing down the operation, for instance, to each individual storage shard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TopNApplicationResult which is basically same as LimitApplicationResult for now and added a test case that tests that when limitGuaranteed is set to false we create a plan with TopNPartial stage with updated connector table handle reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint can you review this again when you find some time?

@martint martint self-requested a review August 3, 2020 19:00
tableScan.getOutputSymbols(),
tableScan.getAssignments()));
if (!result.isTopNGuaranteed()) {
node = new TopNNode(topNNode.getId(), node, topNNode.getCount(), topNNode.getOrderingScheme(), TopNNode.Step.PARTIAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be Step.FINAL. The partial Top N is being handled by the connector at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up here: #4249 (comment)

Comment on lines +41 to +42
private static final Pattern<TopNNode> PATTERN = topN().with(source().matching(
tableScan().capturedAs(TABLE_SCAN)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should trigger no SINGLE step only.
We should not push partial and final separately (and IMO -- we should not push them at all)

@@ -623,7 +624,8 @@ public PlanOptimizers(
new CreatePartialTopN(),
new PushTopNThroughProject(),
new PushTopNThroughOuterJoin(),
new PushTopNThroughUnion())));
new PushTopNThroughUnion(),
new PushTopNIntoTableScan(metadata))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added to pushIntoTableScanOptimizer, when other PushXxxIntoTableScan rules are registered.

This is important for two reasons:

  • this will unlock other pushdowns once TopN is pushed into TableScan fully
  • this will let the rule operate on TopNNode while it is SINGLE step (and consume it fully), making reasoning about engine-connector interactions simpler

@wendigo is going to address this in #6847

tableScan.getAssignments());

if (!result.isTopNGuaranteed()) {
node = new TopNNode(topNNode.getId(), node, topNNode.getCount(), topNNode.getOrderingScheme(), TopNNode.Step.FINAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to #4249 (comment)

Inserting FINAL step is not correct when the rule triggered on TopNNode with Step PARTIAL.

I suggest

  • make the rule trigger for SINGLE step only (my preferred)
    • i am aware sibling PushLimitIntoTableScan triggers for partial limits, but i am not convinced it is beneficial
  • keep same step as it used to be used (basically, use topNNode.replaceChildren)

(this is not being addressed in @wendigo #6847)

@findepi
Copy link
Member

findepi commented Feb 23, 2021

cc @losipiuk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants