Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize execution for output duplicates insensitive joins #5981

Merged
merged 4 commits into from Jan 14, 2021

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Nov 16, 2020

No description provided.

@sopel39
Copy link
Member Author

sopel39 commented Nov 16, 2020

cc: @martint

@sopel39 sopel39 force-pushed the ks/optimize_join branch 2 times, most recently from 542e1b0 to 62ccb84 Compare November 16, 2020 21:33
Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  1. Does this change affect any TPCH / DS queries? If so, how much do they benefit?
  2. Could we have similar rules for ExceptNode distinct and IntersectNode distinct? Also, could the rule's Visitor support those?
  3. How about removing the AggregationNode in case when it has single grouping set over all outputs and deduplication is supported for the particular join?

Copy link
Member Author

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasiafi ac

Does this change affect any TPCH / DS queries? If so, how much do they benefit?

This is to be benchmarked. I've seen lot's of queries that changed

Could we have similar rules for ExceptNode distinct and IntersectNode distinct? Also, could the rule's Visitor support those?

Sure we could use it in more places. To name few:

  • semi join
  • correlated EXISTS with non-eq conditions instead of streaming aggregations

Could you help to identify more places where it can be used? This doesn't have to be part of this PR though.

How about removing the AggregationNode in case when it has single grouping set over all outputs and deduplication is supported for the particular join?

That doesn't work because only matches are deduplicated, but not join input rows.

@sopel39 sopel39 force-pushed the ks/optimize_join branch 3 times, most recently from a595002 to 82830cf Compare November 17, 2020 23:23
@sopel39 sopel39 requested a review from kasiafi November 17, 2020 23:23
@sopel39 sopel39 changed the title Optimize execution for cardinality insensitive joins Optimize execution for output duplicates insensitive joins Nov 18, 2020
@sopel39 sopel39 force-pushed the ks/optimize_join branch 2 times, most recently from 77d6b11 to 3bbbf33 Compare November 18, 2020 10:56
@@ -336,8 +342,13 @@ private boolean joinCurrentPosition(LookupSource lookupSource, DriverYieldSignal
joinSourcePositions++;
}

// get next position on lookup side for this probe row
joinPosition = lookupSource.getNextJoinPosition(joinPosition, probe.getPosition(), probe.getPage());
if (!matchSingleBuildRow || !currentProbePositionProducedRow) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we could make pass this flag (matchSingleBuildRow) so when we build the hashTable we could make sure that only one element exists ? Or override getNextJoinPosition to return -1. Not sure if it would affect other places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a simple PositionalLink that could return -1 for PositionalLink#next

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work when there are no extra filters in JoinNode#filter. It's a good idea. Could it be a follow up?

@kasiafi
Copy link
Member

kasiafi commented Nov 18, 2020

Could you help to identify more places where it can be used? This doesn't have to be part of this PR though.

Here's a summary: #6005

@martint
Copy link
Member

martint commented Nov 18, 2020

This looks like a semijoin in disguise. Have we looked at turning these joins into semijoins instead?

@sopel39
Copy link
Member Author

sopel39 commented Nov 18, 2020

This looks like a semijoin in disguise. Have we looked at turning these joins into semijoins instead?

@martint we purposefully convert filtering semi-joins into joins + aggregation because:

  1. they can participate in CBO which is main advantage
  2. join build side construction is parallelized

output duplicates insensitive join might be similar to some kind of semi-join with extra filter predicate. However, such semi-join would be very similar to lookup-join implementation as in this PR

@martint
Copy link
Member

martint commented Nov 18, 2020

they can participate in CBO which is main advantage

They could be turned into a semijoin after the CBO rules run

join build side construction is parallelized

That's something we should improve for semijoin operator, regardless

output duplicates insensitive join might be similar to some kind of semi-join with extra filter predicate. However, such semi-join would be very similar to lookup-join implementation as in this PR

Yes, parts would be similar, but semi-join doesn't need to worry about producing the values from the other side so that's a lot of code that's not relevant. My concern is about turning the join operator into a "kitchen sink" operator.

@sopel39
Copy link
Member Author

sopel39 commented Nov 18, 2020

They could be turned into a semijoin after the CBO rules run

  1. If CBO flip join sides, then there is no corresponding semi join (there is no reverse of filtering-semi-join -> join transformation)
  2. We should avoid additional rule dependencies.

That's something we should improve for semijoin operator, regardless

I'm not convinced semi-join is that beneficial from perf point of view. It might be necessity for projected semi-join, but for filtering semi-join pruning rows within join has better performance then filter -> semi-join pair (e.g because dictionaries are used). Or course you could introduce another operator like filtering-semi-join, but is that really simpler?

Yes, parts would be similar, but semi-join doesn't need to worry about producing the values from the other side so that's a lot of code that's not relevant.

@martint resemblance of semi-join superficial. In order to achieve similar level of support for filtering semi-join as for first match join, you would need to:

  1. Add support for arbitrary filters in semi-join (like x <> y in tpcds/q95)
  2. Add support for (Partitioned)LookupSource in semi-joins (to be able to evaluate arbitrary filter)
  3. Add support of filtering rows within semi-join itself and use dictionaries for outputting selected probe rows.
  4. Add dedicated support for inequality filters (aka SortedPositionLinks)
  5. Most likely add support for spill

At this point you pretty much have another implementation of lookup join with a caveat that the new implementation cannot output build columns (which lookup join very easily can do as well).

@sopel39
Copy link
Member Author

sopel39 commented Nov 18, 2020

Additionally, rule is this PR works when join sides are flipped, so it's another difference from semi join.

@martint
Copy link
Member

martint commented Nov 18, 2020

If CBO flip join sides, then there is no corresponding semi join (there is no reverse of filtering-semi-join -> join transformation)

That wouldn't matter, no? For cases where we want "first match" and the columns being output matter, the order can't be flipped without losing the first match semantics.

@@ -302,6 +306,12 @@ public PlanNode getRight()
return spillable;
}

@JsonProperty("canSkipOutputDuplicates")
public boolean isCanSkipOutputDuplicates()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect the computed estimates for the join, but I don't see anything in this PR that adjusts that logic. The resulting cardinality should be no larger than that of the left side of the join.

Copy link
Member Author

@sopel39 sopel39 Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, OptimizeOutputDuplicatesInsensitiveJoinRule is executed after ReorderJoins since ReorderJoins creates new JoinNodes without canSkipOutputDuplicates property.

I think, we could improve ReorderJoins to understand that entire join subtree matches canSkipOutputDuplicates, but this is not part of this PR.

@@ -98,6 +100,7 @@ public JoinNode(
this.criteria = ImmutableList.copyOf(criteria);
this.leftOutputSymbols = ImmutableList.copyOf(leftOutputSymbols);
this.rightOutputSymbols = ImmutableList.copyOf(rightOutputSymbols);
this.canSkipOutputDuplicates = canSkipOutputDuplicates;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canSkipOutputDuplicates = true only makes sense if the only columns being output come from the left side of the join (or are equi-join keys in the case of an inner join). Also, this doesn't work for RIGHT or FULL join, so we should validate we're not creating an inconsistent node.

Also, given the semantics of this property, I'm not sure canSkipOutputDuplicates is the best name. It'd be more accurate to call it "outputsFirstMatchOnly" or something similar.

Copy link
Member Author

@sopel39 sopel39 Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this doesn't work for RIGHT or FULL join, so we should validate we're not creating an inconsistent node.

@martint Initially I thought about making JoinNode#canSkipOutputDuplicates implementation specific (to model how LookupJoinOperator and cross join work). However, this complicates reasoning and is more prone to rune ordering. For example, with current approach following scenario would work:

  1. OptimizeOutputDuplicatesInsensitiveJoinRule could mark RIGHT join as canSkipOutputDuplicates
  2. later some other rule could flip join sides, but canSkipOutputDuplicates property is preserved.

Similarly, other rules like PPD, unalias or pruning can fire but canSkipOutputDuplicates property is preserved. If I connected canSkipOutputDuplicates with some concrete join implementation, that would require additional checks if canSkipOutputDuplicates can be preserved. In worst case, canSkipOutputDuplicates property is lost during planning.

By making canSkipOutputDuplicates property of the join algebra itself (and not join implementation) planning logic get simpler and is more stable.

It seems that eventually we should have different concrete plan nodes for different join implementations (e.g LookupJoin, SortJoin, CrossJoin). That would have additional benefit of being able to compare plans with different join implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I suggested in #6005, we need to sort it out before we start optimizing other plans based on canSkipOutputDuplicates property.
In the case captured by the OptimizeOutputDuplicatesInsensitiveJoinRule , the aggregation remains in the plan, so the result is the same, no matter if the join is or is not eventually optimized. However, before we add other transformations depending on join dropping duplicates, we need to find a way to ensure that the execution will follow - that is, that the join will not flip.

Copy link
Member Author

@sopel39 sopel39 Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, before we add other transformations depending on join dropping duplicates, we need to find a way to ensure that the execution will follow - that is, that the join will not flip.

This looks like committing to some concrete implementation of join that has extra properties (e.g cannot be flipped). IIRC some planners actually plan first with abstract node types and only then have rules that choose concrete implementation.

On one hand committing to some operator implementation can open new optimization possibilities, but on the other hand it can close others. Memo based optimizer would be helpful in such case.

@sopel39
Copy link
Member Author

sopel39 commented Nov 18, 2020

That wouldn't matter, no? For cases where we want "first match" and the columns being output matter, the order can't be flipped without losing the first match semantics.

With canSkipOutputDuplicates join sides can be flipped. More than that, canSkipOutputDuplicates propagates to both join sides (see: OptimizeOutputDuplicatesInsensitiveJoinRule). This is because if canSkipOutputDuplicates join consumes left_a, left_b, right_a, right_b then such join also doesn't care about duplicates of left_a, left_b and right_a, right_b pairs. This is because such duplicates wouldn't lead to any new unique left_a, left_b, right_a, right_b rows.

Potentially we could even insert (final) aggregations upstream, but that should be CBO based since such aggregations might not always be beneficial.

@martint
Copy link
Member

martint commented Nov 18, 2020

Can you explain the precise semantics of canSkipOutputDuplicates? I'm still a little confused. Is it duplicates of the join keys? Duplicates of the output rows? Duplicates of the data coming from the left of the join?

@sopel39
Copy link
Member Author

sopel39 commented Nov 18, 2020

Can you explain the precise semantics of canSkipOutputDuplicates? I'm still a little confused. Is it duplicates of the join keys? Duplicates of the output rows? Duplicates of the data coming from the left of the join?

It means that duplicated rows produced by join are irrelevant (e.g because of aggregation) and can be skipped (not produced by join)

@martint
Copy link
Member

martint commented Nov 18, 2020

Ok, I see. And I also see that whether the physical join implementation can take advantage of this attribute depends on the type of join and which columns are being output.

It's a weird concept, since it's some kind of "preference" that's being recorded in the node itself. I'm not sure we have anything like that today, so I need to think about it a bit more.

@sopel39
Copy link
Member Author

sopel39 commented Nov 18, 2020

It's a weird concept, since it's some kind of "preference" that's being recorded in the node itself. I'm not sure we have anything like that today, so I need to think about it a bit more.

For join node we have isSpillabe flag (if join can spill when needed). Whether operator can spill is implementation specific.

@sopel39
Copy link
Member Author

sopel39 commented Nov 19, 2020

Benchmarks comparison-optimize-joins.pdf

Here are benchmark results. tpcds/q95 got >2x wall time reduction and almost 3x CPU reduction. Other queries did not change.

@sopel39
Copy link
Member Author

sopel39 commented Nov 23, 2020

It's a weird concept, since it's some kind of "preference" that's being recorded in the node itself.

I wouldn't call it preference, but rather trait of output data. Potentially, we could keep such output data traits along with output symbols. This particular output trait is not specific to join only (it's also valid for other node types below aggregation). However, only join can take advantage of it with this PR.

@martint
Copy link
Member

martint commented Nov 23, 2020

A trait is a characteristic of the operation. The reason it is a preference is that it's a signal of what the downstream operation is ok with or desires (i.e., doesn't care about duplicates). If it were a trait, it would be a description what the operation actually does, which would make this not a regular join, but a different kind of physical join operation and "can skip duplicate" would not be the proper name.

@sopel39
Copy link
Member Author

sopel39 commented Nov 23, 2020

A trait is a characteristic of the operation. The reason it is a preference is that it's a signal of what the downstream operation is ok with or desires (i.e., doesn't care about duplicates).

Indeed, downstream operator can have preferences and upstream operator can produce data with traits. In this case preference is turned (via rule) into property of a join operation.

@martint Have you give a though about this approach?

@martint martint self-requested a review December 4, 2020 16:40
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename the attribute to "maySkipDuplicates". That conveys the meaning of "is allowed to" better, since the other form can also be interpreted as "is capable of".

// Must run before AddExchanges
new PushDeleteIntoConnector(metadata),
// Must run before AddExchanges and after join reordering
new OptimizeOutputDuplicatesInsensitiveJoinRule(metadata))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this agnostic about where it runs relative to join reordering? What kind of issue does it cause for join reordering that forces this to run after?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comment here:

// Must run after join reordering because join reordering creates
// new join nodes without JoinNode.maySkipOutputDuplicates flag set

@Override
public Result apply(AggregationNode aggregation, Captures captures, Context context)
{
return aggregation.getSource().accept(new Rewriter(metadata, context.getLookup()), null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid visitor-based rewrites as much as possible. We're trying to move away from them, as they make it hard to have a fully exploratory optimizer that considers multiple plans simultaneously.

What we need to do this generically is a description of the properties (traits) of the data coming out of the source of the aggregation, possibly, involving functional dependencies to be able to infer whether certain columns are guaranteed to be unique based on whether they are derived from other unique columns.

Copy link
Member Author

@sopel39 sopel39 Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. In this case, there should be a way to tell optimizer that I want an input plan node which is producing given trait. Optimizer should try to satisfy such requirement (if possible).

We had some ideas around how this can be done in: https://docs.google.com/presentation/d/1rbtJLG89GxEYLZYA09TH5cD4S6UvYXtq0OmdNqQnmws/edit#slide=id.g1fce08ab98_0_4

@sopel39 sopel39 force-pushed the ks/optimize_join branch 2 times, most recently from fbc677f to 09e4edc Compare January 7, 2021 17:04
@sopel39 sopel39 requested a review from martint January 7, 2021 17:10
@@ -844,6 +844,28 @@ MatchResult detailMatches(PlanNode node, StatsProvider stats, Session session, M
return match(newAliases.build());
}

public <T extends PlanNode> PlanMatchPattern with(Class<T> clazz, Predicate<T> predicate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this matching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simplification of with method below that accepts full Matcher. Also withXXX seems to be convention in this class.

Copy link
Member Author

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -844,6 +844,28 @@ MatchResult detailMatches(PlanNode node, StatsProvider stats, Session session, M
return match(newAliases.build());
}

public <T extends PlanNode> PlanMatchPattern with(Class<T> clazz, Predicate<T> predicate)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simplification of with method below that accepts full Matcher. Also withXXX seems to be convention in this class.

@sopel39 sopel39 requested a review from martint January 14, 2021 11:17
For empty aggregations duplicate input rows can be skipped.
Upstream joins can take advantage of this fact and skip producing
of duplicate output rows.
@sopel39 sopel39 merged commit 4421ee6 into trinodb:master Jan 14, 2021
@sopel39 sopel39 mentioned this pull request Jan 14, 2021
10 tasks
@sopel39 sopel39 deleted the ks/optimize_join branch January 19, 2021 11:19
@martint martint added this to the 352 milestone Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants