New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PushdownTopN through Project and Outer Join rules #419
Add PushdownTopN through Project and Outer Join rules #419
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % add tests
topN() | ||
.with(source().matching( | ||
project() | ||
.matching(projectNode -> !projectNode.isIdentity()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? because optimizer would loop forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, same is here: io.prestosql.sql.planner.iterative.rule.PushLimitThroughProject
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a comment about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not(ProjectNode::isIdentity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which not
did you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extends BaseRuleTest | ||
{ | ||
@Test | ||
public void testPushTopNThroughLeftJoin() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also add test when top n has key from left and right side so it cannot be pushed to any side.
@Test | ||
public void testPushTopNThroughLeftJoin() | ||
{ | ||
tester().assertThat(new PushTopNThroughOuterJoin()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add test to TestLogicalPlanner
so that we can see that the rules indeed work in actual plan (e.g: top n is pushed down through projection and join)
topN() | ||
.with(source().matching( | ||
project() | ||
.matching(projectNode -> !projectNode.isIdentity()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, same is here: io.prestosql.sql.planner.iterative.rule.PushLimitThroughProject
project() | ||
.matching(projectNode -> !projectNode.isIdentity()) | ||
.capturedAs(PROJECT_CHILD) | ||
.with(source().matching(node -> !(node instanceof TableScanNode) && !(node instanceof FilterNode))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. If there is no table scan behind filter node, then you could push TopN. So you need to check the child of filter node.
I understand it could be difficult to do in pattern, so maybe you could move it to apply
.
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushTopNThroughProject.java
Show resolved
Hide resolved
00c11cd
to
1eeea6d
Compare
Applied comments. |
.orElseGet(Result::empty); | ||
} | ||
|
||
private static Optional<JoinNode> joinNodeWithTopNSource(TopNNode topNNode, JoinNode joinNode, Context context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please inline this method
upper = min(sourceCardinalityRange.upperEndpoint(), node.getCount()); | ||
} | ||
long lower = min(upper, sourceCardinalityRange.lowerEndpoint()); | ||
return Range.closed(lower, upper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse code of visitLimit
instead of copying.
@@ -299,7 +299,7 @@ public PlanNode getRight() | |||
} | |||
|
|||
@Override | |||
public PlanNode replaceChildren(List<PlanNode> newChildren) | |||
public JoinNode replaceChildren(List<PlanNode> newChildren) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo
|
||
PlanNode projectSource = context.getLookup().resolveGroup(projectNode.getSource()).findFirst().orElseThrow(IllegalStateException::new); | ||
if (projectSource instanceof FilterNode) { | ||
PlanNode filterSource = context.getLookup().resolveGroup(((FilterNode) projectSource).getSource()).findFirst().orElseThrow(IllegalStateException::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format this as:
PlanNode filterSource = context.getLookup().resolveGroup(((FilterNode) projectSource).getSource())
.findFirst()
.orElseThrow(IllegalStateException::new);
Same above.
Also instead of findFirst
you should use com.google.common.collect.MoreCollectors#toOptional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookup#resolveGroup
followed by .findFirst()
(or toOptional
) is only pretending we support non-single element groups here.
It would be way better to use the (deprecated) io.prestosql.sql.planner.iterative.Lookup#resolve
directly instead of pretending we do something better.
anyTree( | ||
join(LEFT, ImmutableList.of(equiJoinClause("N_KEY", "R_KEY")), | ||
project( | ||
topN(1, ImmutableList.of(sort("N_COMM", ASCENDING, LAST)), TopNNode.Step.PARTIAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put each argument in new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the formatting used in this class.
assertPlan("SELECT n.name, r.name FROM nation n LEFT JOIN region r ON n.regionkey = r.regionkey ORDER BY n.comment LIMIT 1", | ||
anyTree( | ||
project( | ||
topN(1, ImmutableList.of(sort("N_COMM", ASCENDING, LAST)), TopNNode.Step.FINAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
1eeea6d
to
f111645
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments and a question.
|
||
private static final Pattern<TopNNode> PATTERN = | ||
topN().with(source().matching( | ||
join().capturedAs(JOIN_CHILD).with(type().matching(type -> isLeftOrFullOuter(type) || isRightOrFullOuter(type))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this expression makes it harder to reason about what this is doing that it should: isLeftOrFullOuter(type) || isRightOrFullOuter(type)
. It's really just saying "is it an outer join?". So, I'd just replace this with: type == LEFT || type == RIGHT || type == FULL
return cardinality.hasUpperBound() && cardinality.upperEndpoint() <= limit; | ||
} | ||
|
||
private static boolean isLeftOrFullOuter(JoinNode.Type type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these methods make the code more readable. They say exactly what the condition expresses, so they are not hiding complexity. I'd just inline them
topN() | ||
.with(source().matching( | ||
project() | ||
.matching(projectNode -> !projectNode.isIdentity()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a comment about this.
if (projectSource instanceof FilterNode) { | ||
PlanNode filterSource = context.getLookup().resolve(((FilterNode) projectSource).getSource()); | ||
if (filterSource instanceof TableScanNode) { | ||
return Result.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we limit this case? I would think this rule should not care about whether there's a TableScan or Filter involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, there's another limitation in the PATTERN
, which says that there cannot be a TableScanNode
as a source of the Projection. Probably both limitations should go to apply()
for symmetry.
Why do we limit this case?
I was about to ask @sopel39 and @kokosing about that, because it was their request. I want to add some explanatory comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint if you push topN below projection that is above filter or table scan then such projection won't be merged as a single PageProcessor
with filter or/and table scan.
If such topN node is PARTIAL
and we have a lot of splits then it could potentially be more expensive to split projection from filter. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, there's another limitation in the PATTERN, which says that there cannot be a TableScanNode as a source of the Projection. Probably both limitations should go to apply() for symmetry.
It's fine to have different but complementary constraints in the pattern vs in apply. Sometimes there are complex cases that cannot be easily described in terms of patterns. But if they can, there's better opportunity for the pattern matcher to do it more efficiently.
I want to add some explanatory comment.
Yes, please do! My general rule of thumb is "if I were to look at some code I wrote a few months down the road, would I be able to figure out what it's doing or the motivation?". If not, a comment is warranted. In this case, it's not entirely obvious why this pattern is excluded, so let's add an explanation.
|
||
TopNNode mappedTopN = symbolMapper.get().map(parent, projectNode.getSource(), context.getIdAllocator().getNextId()); | ||
|
||
return Result.ofPlanNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these can go in the same line.
@Override | ||
public Result apply(TopNNode parent, Captures captures, Context context) | ||
{ | ||
JoinNode joinNode = captures.get(JOIN_CHILD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the parent
TopN
is PARTIAL
then it should be fully pushed down below join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually. Make this rule only execute for PARTIAL
TopN nodes as TopN split happens in CreatePartialTopN
@@ -290,6 +292,8 @@ public PlanOptimizers( | |||
new PushLimitThroughMarkDistinct(), | |||
new PushLimitThroughOuterJoin(), | |||
new PushLimitThroughSemiJoin(), | |||
new PushTopNThroughProject(), | |||
new PushTopNThroughOuterJoin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PushTopNThroughOuterJoin
should be only executed with
builder.add(new IterativeOptimizer(
ruleStats,
statsCalculator,
estimatedExchangesCostCalculator,
ImmutableSet.of(
new CreatePartialTopN(),
new PushTopNThroughUnion())));
PushTopNThroughProject
should be executed here and together with PushTopNThroughOuterJoin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move PushTopNThroughProject
and PushTopNThroughOuterJoin
there? Don't they belong together with PushLimitThroughProject
and PushLimitThroughOuterJoin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move PushTopNThroughProject and PushTopNThroughOuterJoin there?
It is because in that section partial TopN
nodes are created via CreatePartialTopN()
rule. There is already PushTopNThroughUnion()
rule there that pushes partial aggregations though union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more comments
f111645
to
3cd8d0a
Compare
Applied comments. |
* <pre> | ||
* - TopN | ||
* - Project (non-identity) | ||
* - Source other than Filter(TableScan) / TableScan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace /
with or
topN() | ||
.with(source().matching( | ||
project() | ||
.matching(projectNode -> !projectNode.isIdentity()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not(ProjectNode::isIdentity)
.matching(projectNode -> !projectNode.isIdentity()) | ||
.capturedAs(PROJECT_CHILD) | ||
// do not push topN between projection and table scan so that they can be merged into a PageProcessor | ||
.with(source().matching(node -> !(node instanceof TableScanNode))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is io.prestosql.util.MorePredicates#isInstanceOfAny
. Maybe we can add something like isNotInstanceOf
there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be taken care of in the future.
} | ||
|
||
TopNNode mappedTopN = symbolMapper.get().map(parent, projectNode.getSource(), context.getIdAllocator().getNextId()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not needed vertical space.
I would even inline mappedTopN
p.project( | ||
Assignments.of( | ||
projectedA, new SymbolReference("a"), | ||
projectedC, new ArithmeticBinaryExpression(ADD, new SymbolReference("a"), new SymbolReference("b"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: expression("a + b")
@@ -438,7 +438,9 @@ public PlanOptimizers( | |||
estimatedExchangesCostCalculator, | |||
ImmutableSet.of( | |||
new CreatePartialTopN(), | |||
new PushTopNThroughUnion()))); | |||
new PushTopNThroughUnion(), | |||
new PushTopNThroughProject(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line belongs to previous commit.
return applyLimit(node.getSource(), node.getCount()); | ||
} | ||
|
||
private Range<Long> applyLimit(PlanNode source, long limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/applyLimit/visitLimitSource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/applyLimit/visitLimitSource ?
I don't think we should rename it to visitXXX since it makes it easy to confuse with the methods that are implementations of the visitor interface. This is just an internal utility method.
project( | ||
topN(1, ImmutableList.of(sort("N_COMM", ASCENDING, LAST)), TopNNode.Step.PARTIAL, | ||
tableScan("nation", ImmutableMap.of("N_NAME", "name", "N_KEY", "regionkey", "N_COMM", "comment")))), | ||
anyTree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits:
replace with anyNot(TopNode.class, anyTree(...))
?
The best would be anyTreeNot(TopNode.class, tableScan(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't find anyTreeNot()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it does not exists yet. It would be the best if it would have existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be added in another PR. I could give it a try later.
3cd8d0a
to
1d6480f
Compare
Applied some comments. |
@martint do you have any more comments? |
return applyLimit(node.getSource(), node.getCount()); | ||
} | ||
|
||
private Range<Long> applyLimit(PlanNode source, long limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/applyLimit/visitLimitSource ?
I don't think we should rename it to visitXXX since it makes it easy to confuse with the methods that are implementations of the visitor interface. This is just an internal utility method.
Merged, thanks! |
topN() | ||
.with(source().matching( | ||
project() | ||
// do not push topN through identity projection which could be there for column pruning purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially combine two rules into single one, e.g: try to pushdown though project and into connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, a rule set; i created #7029 for this, let's move discussion over there
instead of #220