Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PushdownTopN through Project and Outer Join rules #419

Merged
merged 2 commits into from Mar 26, 2019

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Mar 9, 2019

instead of #220

@cla-bot cla-bot bot added the cla-signed label Mar 9, 2019
sopel39
sopel39 previously approved these changes Mar 11, 2019
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % add tests

topN()
.with(source().matching(
project()
.matching(projectNode -> !projectNode.isIdentity())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? because optimizer would loop forever?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, same is here: io.prestosql.sql.planner.iterative.rule.PushLimitThroughProject

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a comment about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not(ProjectNode::isIdentity)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which not did you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extends BaseRuleTest
{
@Test
public void testPushTopNThroughLeftJoin()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also add test when top n has key from left and right side so it cannot be pushed to any side.

@Test
public void testPushTopNThroughLeftJoin()
{
tester().assertThat(new PushTopNThroughOuterJoin())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add test to TestLogicalPlanner so that we can see that the rules indeed work in actual plan (e.g: top n is pushed down through projection and join)

topN()
.with(source().matching(
project()
.matching(projectNode -> !projectNode.isIdentity())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, same is here: io.prestosql.sql.planner.iterative.rule.PushLimitThroughProject

project()
.matching(projectNode -> !projectNode.isIdentity())
.capturedAs(PROJECT_CHILD)
.with(source().matching(node -> !(node instanceof TableScanNode) && !(node instanceof FilterNode)))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. If there is no table scan behind filter node, then you could push TopN. So you need to check the child of filter node.

I understand it could be difficult to do in pattern, so maybe you could move it to apply.

@kasiafi kasiafi force-pushed the pushdownTopNThroughProjectOuter branch from 00c11cd to 1eeea6d Compare March 11, 2019 23:21
@kasiafi
Copy link
Member Author

kasiafi commented Mar 12, 2019

Applied comments.

.orElseGet(Result::empty);
}

private static Optional<JoinNode> joinNodeWithTopNSource(TopNNode topNNode, JoinNode joinNode, Context context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please inline this method

upper = min(sourceCardinalityRange.upperEndpoint(), node.getCount());
}
long lower = min(upper, sourceCardinalityRange.lowerEndpoint());
return Range.closed(lower, upper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse code of visitLimit instead of copying.

@@ -299,7 +299,7 @@ public PlanNode getRight()
}

@Override
public PlanNode replaceChildren(List<PlanNode> newChildren)
public JoinNode replaceChildren(List<PlanNode> newChildren)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo


PlanNode projectSource = context.getLookup().resolveGroup(projectNode.getSource()).findFirst().orElseThrow(IllegalStateException::new);
if (projectSource instanceof FilterNode) {
PlanNode filterSource = context.getLookup().resolveGroup(((FilterNode) projectSource).getSource()).findFirst().orElseThrow(IllegalStateException::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format this as:

PlanNode filterSource = context.getLookup().resolveGroup(((FilterNode) projectSource).getSource())
    .findFirst()
    .orElseThrow(IllegalStateException::new);

Same above.

Also instead of findFirst you should use com.google.common.collect.MoreCollectors#toOptional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookup#resolveGroup followed by .findFirst() (or toOptional) is only pretending we support non-single element groups here.
It would be way better to use the (deprecated) io.prestosql.sql.planner.iterative.Lookup#resolve directly instead of pretending we do something better.

anyTree(
join(LEFT, ImmutableList.of(equiJoinClause("N_KEY", "R_KEY")),
project(
topN(1, ImmutableList.of(sort("N_COMM", ASCENDING, LAST)), TopNNode.Step.PARTIAL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put each argument in new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the formatting used in this class.

assertPlan("SELECT n.name, r.name FROM nation n LEFT JOIN region r ON n.regionkey = r.regionkey ORDER BY n.comment LIMIT 1",
anyTree(
project(
topN(1, ImmutableList.of(sort("N_COMM", ASCENDING, LAST)), TopNNode.Step.FINAL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@kasiafi kasiafi force-pushed the pushdownTopNThroughProjectOuter branch from 1eeea6d to f111645 Compare March 12, 2019 22:10
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and a question.


private static final Pattern<TopNNode> PATTERN =
topN().with(source().matching(
join().capturedAs(JOIN_CHILD).with(type().matching(type -> isLeftOrFullOuter(type) || isRightOrFullOuter(type)))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this expression makes it harder to reason about what this is doing that it should: isLeftOrFullOuter(type) || isRightOrFullOuter(type). It's really just saying "is it an outer join?". So, I'd just replace this with: type == LEFT || type == RIGHT || type == FULL

return cardinality.hasUpperBound() && cardinality.upperEndpoint() <= limit;
}

private static boolean isLeftOrFullOuter(JoinNode.Type type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these methods make the code more readable. They say exactly what the condition expresses, so they are not hiding complexity. I'd just inline them

topN()
.with(source().matching(
project()
.matching(projectNode -> !projectNode.isIdentity())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a comment about this.

if (projectSource instanceof FilterNode) {
PlanNode filterSource = context.getLookup().resolve(((FilterNode) projectSource).getSource());
if (filterSource instanceof TableScanNode) {
return Result.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we limit this case? I would think this rule should not care about whether there's a TableScan or Filter involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, there's another limitation in the PATTERN, which says that there cannot be a TableScanNode as a source of the Projection. Probably both limitations should go to apply() for symmetry.

Why do we limit this case?

I was about to ask @sopel39 and @kokosing about that, because it was their request. I want to add some explanatory comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint if you push topN below projection that is above filter or table scan then such projection won't be merged as a single PageProcessor with filter or/and table scan.

If such topN node is PARTIAL and we have a lot of splits then it could potentially be more expensive to split projection from filter. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, there's another limitation in the PATTERN, which says that there cannot be a TableScanNode as a source of the Projection. Probably both limitations should go to apply() for symmetry.

It's fine to have different but complementary constraints in the pattern vs in apply. Sometimes there are complex cases that cannot be easily described in terms of patterns. But if they can, there's better opportunity for the pattern matcher to do it more efficiently.

I want to add some explanatory comment.

Yes, please do! My general rule of thumb is "if I were to look at some code I wrote a few months down the road, would I be able to figure out what it's doing or the motivation?". If not, a comment is warranted. In this case, it's not entirely obvious why this pattern is excluded, so let's add an explanation.


TopNNode mappedTopN = symbolMapper.get().map(parent, projectNode.getSource(), context.getIdAllocator().getNextId());

return Result.ofPlanNode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these can go in the same line.

@Override
public Result apply(TopNNode parent, Captures captures, Context context)
{
JoinNode joinNode = captures.get(JOIN_CHILD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the parent TopN is PARTIAL then it should be fully pushed down below join

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually. Make this rule only execute for PARTIAL TopN nodes as TopN split happens in CreatePartialTopN

@@ -290,6 +292,8 @@ public PlanOptimizers(
new PushLimitThroughMarkDistinct(),
new PushLimitThroughOuterJoin(),
new PushLimitThroughSemiJoin(),
new PushTopNThroughProject(),
new PushTopNThroughOuterJoin(),
Copy link
Member

@sopel39 sopel39 Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PushTopNThroughOuterJoin should be only executed with

        builder.add(new IterativeOptimizer(
                ruleStats,
                statsCalculator,
                estimatedExchangesCostCalculator,
                ImmutableSet.of(
                        new CreatePartialTopN(),
                        new PushTopNThroughUnion())));

PushTopNThroughProject should be executed here and together with PushTopNThroughOuterJoin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move PushTopNThroughProject and PushTopNThroughOuterJoin there? Don't they belong together with PushLimitThroughProject and PushLimitThroughOuterJoin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move PushTopNThroughProject and PushTopNThroughOuterJoin there?

It is because in that section partial TopN nodes are created via CreatePartialTopN() rule. There is already PushTopNThroughUnion() rule there that pushes partial aggregations though union.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments

@sopel39 sopel39 dismissed their stale review March 13, 2019 10:10

more comments

@kasiafi kasiafi force-pushed the pushdownTopNThroughProjectOuter branch from f111645 to 3cd8d0a Compare March 21, 2019 21:11
@kasiafi
Copy link
Member Author

kasiafi commented Mar 22, 2019

Applied comments.

* <pre>
* - TopN
* - Project (non-identity)
* - Source other than Filter(TableScan) / TableScan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace / with or

topN()
.with(source().matching(
project()
.matching(projectNode -> !projectNode.isIdentity())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not(ProjectNode::isIdentity)

.matching(projectNode -> !projectNode.isIdentity())
.capturedAs(PROJECT_CHILD)
// do not push topN between projection and table scan so that they can be merged into a PageProcessor
.with(source().matching(node -> !(node instanceof TableScanNode)))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There is io.prestosql.util.MorePredicates#isInstanceOfAny. Maybe we can add something like isNotInstanceOf there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be taken care of in the future.

}

TopNNode mappedTopN = symbolMapper.get().map(parent, projectNode.getSource(), context.getIdAllocator().getNextId());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needed vertical space.

I would even inline mappedTopN

p.project(
Assignments.of(
projectedA, new SymbolReference("a"),
projectedC, new ArithmeticBinaryExpression(ADD, new SymbolReference("a"), new SymbolReference("b"))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expression("a + b")

@@ -438,7 +438,9 @@ public PlanOptimizers(
estimatedExchangesCostCalculator,
ImmutableSet.of(
new CreatePartialTopN(),
new PushTopNThroughUnion())));
new PushTopNThroughUnion(),
new PushTopNThroughProject(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line belongs to previous commit.

return applyLimit(node.getSource(), node.getCount());
}

private Range<Long> applyLimit(PlanNode source, long limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/applyLimit/visitLimitSource ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/applyLimit/visitLimitSource ?

I don't think we should rename it to visitXXX since it makes it easy to confuse with the methods that are implementations of the visitor interface. This is just an internal utility method.

project(
topN(1, ImmutableList.of(sort("N_COMM", ASCENDING, LAST)), TopNNode.Step.PARTIAL,
tableScan("nation", ImmutableMap.of("N_NAME", "name", "N_KEY", "regionkey", "N_COMM", "comment")))),
anyTree(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits:

replace with anyNot(TopNode.class, anyTree(...))?
The best would be anyTreeNot(TopNode.class, tableScan(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't find anyTreeNot()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it does not exists yet. It would be the best if it would have existed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be added in another PR. I could give it a try later.

@kasiafi kasiafi force-pushed the pushdownTopNThroughProjectOuter branch from 3cd8d0a to 1d6480f Compare March 22, 2019 22:24
@kasiafi
Copy link
Member Author

kasiafi commented Mar 22, 2019

Applied some comments.

@kasiafi
Copy link
Member Author

kasiafi commented Mar 25, 2019

@martint do you have any more comments?

return applyLimit(node.getSource(), node.getCount());
}

private Range<Long> applyLimit(PlanNode source, long limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/applyLimit/visitLimitSource ?

I don't think we should rename it to visitXXX since it makes it easy to confuse with the methods that are implementations of the visitor interface. This is just an internal utility method.

@kokosing kokosing merged commit ec57ed6 into trinodb:master Mar 26, 2019
@kokosing
Copy link
Member

Merged, thanks!

@kokosing kokosing mentioned this pull request Mar 26, 2019
6 tasks
@kokosing kokosing added this to the 307 milestone Mar 26, 2019
topN()
.with(source().matching(
project()
// do not push topN through identity projection which could be there for column pruning purposes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we need a Push TopN through Project into TableScan rule to fully leverage #6847?

cc @losipiuk @wendigo

Copy link
Member

@sopel39 sopel39 Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially combine two rules into single one, e.g: try to pushdown though project and into connector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, a rule set; i created #7029 for this, let's move discussion over there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants