Optimize execution for output duplicates insensitive joins #5981

sopel39 · 2020-11-16T17:38:19Z

No description provided.

sopel39 · 2020-11-16T17:38:29Z

cc: @martint

presto-main/src/main/java/io/prestosql/sql/planner/plan/JoinNode.java

...n/src/main/java/io/prestosql/sql/planner/iterative/rule/PushAggregationThroughOuterJoin.java

presto-main/src/main/java/io/prestosql/sql/planner/plan/JoinNode.java

...java/io/prestosql/sql/planner/iterative/rule/TestOptimizeCardinalityInsensitiveJoinRule.java

presto-server-main/etc/config.properties

kasiafi

A few questions:

Does this change affect any TPCH / DS queries? If so, how much do they benefit?
Could we have similar rules for ExceptNode distinct and IntersectNode distinct? Also, could the rule's Visitor support those?
How about removing the AggregationNode in case when it has single grouping set over all outputs and deduplication is supported for the particular join?

presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java

presto-testing/src/main/java/io/prestosql/testing/AbstractTestJoinQueries.java

sopel39

@kasiafi ac

Does this change affect any TPCH / DS queries? If so, how much do they benefit?

This is to be benchmarked. I've seen lot's of queries that changed

Could we have similar rules for ExceptNode distinct and IntersectNode distinct? Also, could the rule's Visitor support those?

Sure we could use it in more places. To name few:

semi join
correlated EXISTS with non-eq conditions instead of streaming aggregations

Could you help to identify more places where it can be used? This doesn't have to be part of this PR though.

How about removing the AggregationNode in case when it has single grouping set over all outputs and deduplication is supported for the particular join?

That doesn't work because only matches are deduplicated, but not join input rows.

...n/src/main/java/io/prestosql/sql/planner/iterative/rule/PushAggregationThroughOuterJoin.java

Praveen2112 · 2020-11-18T13:23:36Z

presto-main/src/main/java/io/prestosql/operator/LookupJoinOperator.java

@@ -336,8 +342,13 @@ private boolean joinCurrentPosition(LookupSource lookupSource, DriverYieldSignal
                    joinSourcePositions++;
                }

-                // get next position on lookup side for this probe row
-                joinPosition = lookupSource.getNextJoinPosition(joinPosition, probe.getPosition(), probe.getPage());
+                if (!matchSingleBuildRow || !currentProbePositionProducedRow) {


What if we could make pass this flag (matchSingleBuildRow) so when we build the hashTable we could make sure that only one element exists ? Or override getNextJoinPosition to return -1. Not sure if it would affect other places

Or maybe a simple PositionalLink that could return -1 for PositionalLink#next

That would work when there are no extra filters in JoinNode#filter. It's a good idea. Could it be a follow up?

kasiafi · 2020-11-18T14:55:24Z

Could you help to identify more places where it can be used? This doesn't have to be part of this PR though.

Here's a summary: #6005

martint · 2020-11-18T15:58:40Z

This looks like a semijoin in disguise. Have we looked at turning these joins into semijoins instead?

sopel39 · 2020-11-18T16:49:21Z

This looks like a semijoin in disguise. Have we looked at turning these joins into semijoins instead?

@martint we purposefully convert filtering semi-joins into joins + aggregation because:

they can participate in CBO which is main advantage
join build side construction is parallelized

output duplicates insensitive join might be similar to some kind of semi-join with extra filter predicate. However, such semi-join would be very similar to lookup-join implementation as in this PR

martint · 2020-11-18T17:57:57Z

they can participate in CBO which is main advantage

They could be turned into a semijoin after the CBO rules run

join build side construction is parallelized

That's something we should improve for semijoin operator, regardless

output duplicates insensitive join might be similar to some kind of semi-join with extra filter predicate. However, such semi-join would be very similar to lookup-join implementation as in this PR

Yes, parts would be similar, but semi-join doesn't need to worry about producing the values from the other side so that's a lot of code that's not relevant. My concern is about turning the join operator into a "kitchen sink" operator.

sopel39 · 2020-11-18T18:30:09Z

They could be turned into a semijoin after the CBO rules run

If CBO flip join sides, then there is no corresponding semi join (there is no reverse of filtering-semi-join -> join transformation)
We should avoid additional rule dependencies.

That's something we should improve for semijoin operator, regardless

I'm not convinced semi-join is that beneficial from perf point of view. It might be necessity for projected semi-join, but for filtering semi-join pruning rows within join has better performance then filter -> semi-join pair (e.g because dictionaries are used). Or course you could introduce another operator like filtering-semi-join, but is that really simpler?

Yes, parts would be similar, but semi-join doesn't need to worry about producing the values from the other side so that's a lot of code that's not relevant.

@martint resemblance of semi-join superficial. In order to achieve similar level of support for filtering semi-join as for first match join, you would need to:

Add support for arbitrary filters in semi-join (like x <> y in tpcds/q95)
Add support for (Partitioned)LookupSource in semi-joins (to be able to evaluate arbitrary filter)
Add support of filtering rows within semi-join itself and use dictionaries for outputting selected probe rows.
Add dedicated support for inequality filters (aka SortedPositionLinks)
Most likely add support for spill

At this point you pretty much have another implementation of lookup join with a caveat that the new implementation cannot output build columns (which lookup join very easily can do as well).

sopel39 · 2020-11-18T18:45:31Z

Additionally, rule is this PR works when join sides are flipped, so it's another difference from semi join.

martint · 2020-11-18T18:46:50Z

If CBO flip join sides, then there is no corresponding semi join (there is no reverse of filtering-semi-join -> join transformation)

That wouldn't matter, no? For cases where we want "first match" and the columns being output matter, the order can't be flipped without losing the first match semantics.

martint · 2020-11-18T18:52:23Z

presto-main/src/main/java/io/prestosql/sql/planner/plan/JoinNode.java

@@ -302,6 +306,12 @@ public PlanNode getRight()
        return spillable;
    }

+    @JsonProperty("canSkipOutputDuplicates")
+    public boolean isCanSkipOutputDuplicates()


This will affect the computed estimates for the join, but I don't see anything in this PR that adjusts that logic. The resulting cardinality should be no larger than that of the left side of the join.

Currently, OptimizeOutputDuplicatesInsensitiveJoinRule is executed after ReorderJoins since ReorderJoins creates new JoinNodes without canSkipOutputDuplicates property.

I think, we could improve ReorderJoins to understand that entire join subtree matches canSkipOutputDuplicates, but this is not part of this PR.

martint · 2020-11-18T18:54:54Z

presto-main/src/main/java/io/prestosql/sql/planner/plan/JoinNode.java

@@ -98,6 +100,7 @@ public JoinNode(
        this.criteria = ImmutableList.copyOf(criteria);
        this.leftOutputSymbols = ImmutableList.copyOf(leftOutputSymbols);
        this.rightOutputSymbols = ImmutableList.copyOf(rightOutputSymbols);
+        this.canSkipOutputDuplicates = canSkipOutputDuplicates;


canSkipOutputDuplicates = true only makes sense if the only columns being output come from the left side of the join (or are equi-join keys in the case of an inner join). Also, this doesn't work for RIGHT or FULL join, so we should validate we're not creating an inconsistent node.

Also, given the semantics of this property, I'm not sure canSkipOutputDuplicates is the best name. It'd be more accurate to call it "outputsFirstMatchOnly" or something similar.

Also, this doesn't work for RIGHT or FULL join, so we should validate we're not creating an inconsistent node.

@martint Initially I thought about making JoinNode#canSkipOutputDuplicates implementation specific (to model how LookupJoinOperator and cross join work). However, this complicates reasoning and is more prone to rune ordering. For example, with current approach following scenario would work:

OptimizeOutputDuplicatesInsensitiveJoinRule could mark RIGHT join as canSkipOutputDuplicates

later some other rule could flip join sides, but canSkipOutputDuplicates property is preserved.

Similarly, other rules like PPD, unalias or pruning can fire but canSkipOutputDuplicates property is preserved. If I connected canSkipOutputDuplicates with some concrete join implementation, that would require additional checks if canSkipOutputDuplicates can be preserved. In worst case, canSkipOutputDuplicates property is lost during planning.

By making canSkipOutputDuplicates property of the join algebra itself (and not join implementation) planning logic get simpler and is more stable.

It seems that eventually we should have different concrete plan nodes for different join implementations (e.g LookupJoin, SortJoin, CrossJoin). That would have additional benefit of being able to compare plans with different join implementations.

Like I suggested in #6005, we need to sort it out before we start optimizing other plans based on canSkipOutputDuplicates property.
In the case captured by the OptimizeOutputDuplicatesInsensitiveJoinRule , the aggregation remains in the plan, so the result is the same, no matter if the join is or is not eventually optimized. However, before we add other transformations depending on join dropping duplicates, we need to find a way to ensure that the execution will follow - that is, that the join will not flip.

However, before we add other transformations depending on join dropping duplicates, we need to find a way to ensure that the execution will follow - that is, that the join will not flip.

This looks like committing to some concrete implementation of join that has extra properties (e.g cannot be flipped). IIRC some planners actually plan first with abstract node types and only then have rules that choose concrete implementation.

On one hand committing to some operator implementation can open new optimization possibilities, but on the other hand it can close others. Memo based optimizer would be helpful in such case.

sopel39 · 2020-11-18T19:34:01Z

That wouldn't matter, no? For cases where we want "first match" and the columns being output matter, the order can't be flipped without losing the first match semantics.

With canSkipOutputDuplicates join sides can be flipped. More than that, canSkipOutputDuplicates propagates to both join sides (see: OptimizeOutputDuplicatesInsensitiveJoinRule). This is because if canSkipOutputDuplicates join consumes left_a, left_b, right_a, right_b then such join also doesn't care about duplicates of left_a, left_b and right_a, right_b pairs. This is because such duplicates wouldn't lead to any new unique left_a, left_b, right_a, right_b rows.

Potentially we could even insert (final) aggregations upstream, but that should be CBO based since such aggregations might not always be beneficial.

martint · 2020-11-18T21:07:18Z

Can you explain the precise semantics of canSkipOutputDuplicates? I'm still a little confused. Is it duplicates of the join keys? Duplicates of the output rows? Duplicates of the data coming from the left of the join?

sopel39 · 2020-11-18T21:28:49Z

Can you explain the precise semantics of canSkipOutputDuplicates? I'm still a little confused. Is it duplicates of the join keys? Duplicates of the output rows? Duplicates of the data coming from the left of the join?

It means that duplicated rows produced by join are irrelevant (e.g because of aggregation) and can be skipped (not produced by join)

martint · 2020-11-18T21:44:51Z

Ok, I see. And I also see that whether the physical join implementation can take advantage of this attribute depends on the type of join and which columns are being output.

It's a weird concept, since it's some kind of "preference" that's being recorded in the node itself. I'm not sure we have anything like that today, so I need to think about it a bit more.

sopel39 · 2020-11-18T21:52:12Z

It's a weird concept, since it's some kind of "preference" that's being recorded in the node itself. I'm not sure we have anything like that today, so I need to think about it a bit more.

For join node we have isSpillabe flag (if join can spill when needed). Whether operator can spill is implementation specific.

sopel39 · 2020-11-19T09:38:01Z

Benchmarks comparison-optimize-joins.pdf

Here are benchmark results. tpcds/q95 got >2x wall time reduction and almost 3x CPU reduction. Other queries did not change.

sopel39 · 2020-11-23T10:59:42Z

It's a weird concept, since it's some kind of "preference" that's being recorded in the node itself.

I wouldn't call it preference, but rather trait of output data. Potentially, we could keep such output data traits along with output symbols. This particular output trait is not specific to join only (it's also valid for other node types below aggregation). However, only join can take advantage of it with this PR.

martint · 2020-11-23T15:42:14Z

A trait is a characteristic of the operation. The reason it is a preference is that it's a signal of what the downstream operation is ok with or desires (i.e., doesn't care about duplicates). If it were a trait, it would be a description what the operation actually does, which would make this not a regular join, but a different kind of physical join operation and "can skip duplicate" would not be the proper name.

sopel39 · 2020-11-23T15:51:09Z

A trait is a characteristic of the operation. The reason it is a preference is that it's a signal of what the downstream operation is ok with or desires (i.e., doesn't care about duplicates).

Indeed, downstream operator can have preferences and upstream operator can produce data with traits. In this case preference is turned (via rule) into property of a join operation.

@martint Have you give a though about this approach?

martint

I'd rename the attribute to "maySkipDuplicates". That conveys the meaning of "is allowed to" better, since the other form can also be interpreted as "is capable of".

martint · 2020-12-07T21:43:44Z

presto-main/src/main/java/io/prestosql/sql/planner/PlanOptimizers.java

+                        // Must run before AddExchanges
+                        new PushDeleteIntoConnector(metadata),
+                        // Must run before AddExchanges and after join reordering
+                        new OptimizeOutputDuplicatesInsensitiveJoinRule(metadata))));


Can we make this agnostic about where it runs relative to join reordering? What kind of issue does it cause for join reordering that forces this to run after?

I've added comment here:

// Must run after join reordering because join reordering creates // new join nodes without JoinNode.maySkipOutputDuplicates flag set

...ava/io/prestosql/sql/planner/iterative/rule/OptimizeOutputDuplicatesInsensitiveJoinRule.java

martint · 2020-12-07T21:50:33Z

...ava/io/prestosql/sql/planner/iterative/rule/OptimizeOutputDuplicatesInsensitiveJoinRule.java

+    @Override
+    public Result apply(AggregationNode aggregation, Captures captures, Context context)
+    {
+        return aggregation.getSource().accept(new Rewriter(metadata, context.getLookup()), null)


We should avoid visitor-based rewrites as much as possible. We're trying to move away from them, as they make it hard to have a fully exploratory optimizer that considers multiple plans simultaneously.

What we need to do this generically is a description of the properties (traits) of the data coming out of the source of the aggregation, possibly, involving functional dependencies to be able to infer whether certain columns are guaranteed to be unique based on whether they are derived from other unique columns.

I agree. In this case, there should be a way to tell optimizer that I want an input plan node which is producing given trait. Optimizer should try to satisfy such requirement (if possible).

We had some ideas around how this can be done in: https://docs.google.com/presentation/d/1rbtJLG89GxEYLZYA09TH5cD4S6UvYXtq0OmdNqQnmws/edit#slide=id.g1fce08ab98_0_4

core/trino-main/src/main/java/io/trino/SystemSessionProperties.java

core/trino-main/src/main/java/io/trino/operator/LookupJoinOperator.java

...rc/main/java/io/trino/sql/planner/iterative/rule/OptimizeJoinsBelowEmptyAggregationRule.java

martint · 2021-01-13T22:29:08Z

core/trino-main/src/test/java/io/trino/sql/planner/assertions/PlanMatchPattern.java

@@ -844,6 +844,28 @@ MatchResult detailMatches(PlanNode node, StatsProvider stats, Session session, M
        return match(newAliases.build());
    }

+    public <T extends PlanNode> PlanMatchPattern with(Class<T> clazz, Predicate<T> predicate)


maybe call this matching?

It's simplification of with method below that accepts full Matcher. Also withXXX seems to be convention in this class.

sopel39

ac @martint

...rc/main/java/io/trino/sql/planner/iterative/rule/OptimizeJoinsBelowEmptyAggregationRule.java

sopel39 · 2021-01-14T11:14:36Z

core/trino-main/src/test/java/io/trino/sql/planner/assertions/PlanMatchPattern.java

@@ -844,6 +844,28 @@ MatchResult detailMatches(PlanNode node, StatsProvider stats, Session session, M
        return match(newAliases.build());
    }

+    public <T extends PlanNode> PlanMatchPattern with(Class<T> clazz, Predicate<T> predicate)


It's simplification of with method below that accepts full Matcher. Also withXXX seems to be convention in this class.

For empty aggregations duplicate input rows can be skipped. Upstream joins can take advantage of this fact and skip producing of duplicate output rows.

sopel39 requested review from Praveen2112 and kasiafi November 16, 2020 17:38

cla-bot bot added the cla-signed label Nov 16, 2020

sopel39 force-pushed the ks/optimize_join branch 2 times, most recently from 542e1b0 to 62ccb84 Compare November 16, 2020 21:33

findepi reviewed Nov 16, 2020

View reviewed changes

presto-main/src/main/java/io/prestosql/sql/planner/plan/JoinNode.java Outdated Show resolved Hide resolved

kasiafi reviewed Nov 16, 2020

View reviewed changes

...n/src/main/java/io/prestosql/sql/planner/iterative/rule/PushAggregationThroughOuterJoin.java Outdated Show resolved Hide resolved

presto-main/src/main/java/io/prestosql/sql/planner/plan/JoinNode.java Outdated Show resolved Hide resolved

sopel39 force-pushed the ks/optimize_join branch from 62ccb84 to c3a703d Compare November 17, 2020 15:49

kasiafi reviewed Nov 17, 2020

View reviewed changes

...java/io/prestosql/sql/planner/iterative/rule/TestOptimizeCardinalityInsensitiveJoinRule.java Outdated Show resolved Hide resolved

kasiafi reviewed Nov 17, 2020

View reviewed changes

presto-server-main/etc/config.properties Outdated Show resolved Hide resolved

kasiafi reviewed Nov 17, 2020

View reviewed changes

presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java Outdated Show resolved Hide resolved

presto-testing/src/main/java/io/prestosql/testing/AbstractTestJoinQueries.java Outdated Show resolved Hide resolved

sopel39 force-pushed the ks/optimize_join branch from c3a703d to e342417 Compare November 17, 2020 23:14

sopel39 commented Nov 17, 2020

View reviewed changes

...n/src/main/java/io/prestosql/sql/planner/iterative/rule/PushAggregationThroughOuterJoin.java Outdated Show resolved Hide resolved

sopel39 force-pushed the ks/optimize_join branch 3 times, most recently from a595002 to 82830cf Compare November 17, 2020 23:23

sopel39 requested a review from kasiafi November 17, 2020 23:23

sopel39 changed the title ~~Optimize execution for cardinality insensitive joins~~ Optimize execution for output duplicates insensitive joins Nov 18, 2020

sopel39 force-pushed the ks/optimize_join branch 2 times, most recently from 77d6b11 to 3bbbf33 Compare November 18, 2020 10:56

Praveen2112 reviewed Nov 18, 2020

View reviewed changes

kasiafi mentioned this pull request Nov 18, 2020

Use "deduplicating" join in duplicate-insensitive context #6005

Open

martint reviewed Nov 18, 2020

View reviewed changes

martint self-requested a review December 4, 2020 16:40

martint reviewed Jan 4, 2021

View reviewed changes

Fix formatting

8b99e07

sopel39 force-pushed the ks/optimize_join branch from 3bbbf33 to 9a4b113 Compare January 7, 2021 13:02

sopel39 added 2 commits January 7, 2021 15:45

Add JoinNode#maySkipOutputDuplicates

3b31bd4

Add "may skip output duplicates" as part of tpcds/tpcds plans testing

50d15f7

sopel39 force-pushed the ks/optimize_join branch 2 times, most recently from fbc677f to 09e4edc Compare January 7, 2021 17:04

sopel39 requested a review from martint January 7, 2021 17:10

martint reviewed Jan 13, 2021

View reviewed changes

sopel39 force-pushed the ks/optimize_join branch from 09e4edc to 4fbf5ef Compare January 14, 2021 11:17

sopel39 commented Jan 14, 2021

View reviewed changes

sopel39 requested a review from martint January 14, 2021 11:17

Optimize execution for joins below empty aggregation

da74bbe

For empty aggregations duplicate input rows can be skipped. Upstream joins can take advantage of this fact and skip producing of duplicate output rows.

sopel39 force-pushed the ks/optimize_join branch from 4fbf5ef to da74bbe Compare January 14, 2021 12:24

martint approved these changes Jan 14, 2021

View reviewed changes

sopel39 merged commit 4421ee6 into trinodb:master Jan 14, 2021

sopel39 mentioned this pull request Jan 14, 2021

Release notes for 352 #6502

Closed

10 tasks

sopel39 deleted the ks/optimize_join branch January 19, 2021 11:19

martint added this to the 352 milestone Jan 28, 2021

Optimize execution for output duplicates insensitive joins #5981

Optimize execution for output duplicates insensitive joins #5981

Conversation

sopel39 commented Nov 16, 2020

sopel39 commented Nov 16, 2020

kasiafi left a comment

Choose a reason for hiding this comment

sopel39 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasiafi commented Nov 18, 2020

martint commented Nov 18, 2020

sopel39 commented Nov 18, 2020 • edited

martint commented Nov 18, 2020

sopel39 commented Nov 18, 2020 • edited

sopel39 commented Nov 18, 2020

martint commented Nov 18, 2020

Choose a reason for hiding this comment

sopel39 Nov 18, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 Nov 18, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 Nov 18, 2020 • edited

Choose a reason for hiding this comment

sopel39 commented Nov 18, 2020 • edited

martint commented Nov 18, 2020

sopel39 commented Nov 18, 2020

martint commented Nov 18, 2020

sopel39 commented Nov 18, 2020 • edited

sopel39 commented Nov 19, 2020

sopel39 commented Nov 23, 2020 • edited

martint commented Nov 23, 2020

sopel39 commented Nov 23, 2020

martint left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 Jan 7, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 left a comment •

edited

sopel39 commented Nov 18, 2020 •

edited

sopel39 commented Nov 18, 2020 •

edited

sopel39 Nov 18, 2020 •

edited

sopel39 Nov 18, 2020 •

edited

sopel39 Nov 18, 2020 •

edited

sopel39 commented Nov 18, 2020 •

edited

sopel39 commented Nov 18, 2020 •

edited

sopel39 commented Nov 23, 2020 •

edited

sopel39 Jan 7, 2021 •

edited