Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use boolean or aggregation in correlated exists #21422

Merged
merged 1 commit into from
May 7, 2024

Conversation

Dith3r
Copy link
Member

@Dith3r Dith3r commented Apr 5, 2024

Description

Improve performance of correlated exists by replacing count aggregation with bool_or,
which reduced the need to use additional symbol with mask. As a result, aggregation can be
marked as streaming and join can be marked as may skip duplicates.

after_q21.txt
before_q21.txt

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# General
* Improve performance of non-equality correlated exists queries. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 5, 2024
@Dith3r Dith3r requested a review from sopel39 April 5, 2024 14:16
@Dith3r Dith3r force-pushed the ke/correletedexist branch 7 times, most recently from 9d083d0 to 2c2461c Compare April 12, 2024 11:29
@@ -143,7 +141,7 @@ private Optional<PlanNode> rewriteToNonDefaultAggregation(ApplyNode applyNode, C
Symbol exists = getOnlyElement(applyNode.getSubqueryAssignments().keySet());
Assignments.Builder assignments = Assignments.builder()
.putIdentities(applyNode.getInput().getOutputSymbols())
.put(exists, new Coalesce(ImmutableList.of(subqueryTrue.toSymbolReference(), Booleans.FALSE)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import -> separate commit

@sopel39
Copy link
Member

sopel39 commented Apr 17, 2024

@kasiafi do you want to take a look?

@Dith3r Dith3r force-pushed the ke/correletedexist branch 2 times, most recently from b89dc45 to d6ec139 Compare April 18, 2024 08:03
@Dith3r Dith3r requested a review from sopel39 April 18, 2024 08:06
@Dith3r Dith3r force-pushed the ke/correletedexist branch 2 times, most recently from d893121 to 7d64ca0 Compare April 24, 2024 14:36
@Dith3r Dith3r force-pushed the ke/correletedexist branch 3 times, most recently from c31662c to 0fbec15 Compare April 24, 2024 16:37
@Dith3r Dith3r force-pushed the ke/correletedexist branch 3 times, most recently from 2ff67c7 to 4a37220 Compare April 26, 2024 10:55
@Dith3r Dith3r requested a review from sopel39 April 26, 2024 11:01
@Dith3r Dith3r requested a review from sopel39 April 26, 2024 15:25
@Dith3r Dith3r force-pushed the ke/correletedexist branch 2 times, most recently from f25f754 to 916f0c0 Compare April 26, 2024 16:00
@@ -200,7 +212,7 @@ public Result apply(CorrelatedJoinNode correlatedJoinNode, Captures captures, Co
ImmutableList.of(),
inputWithUniqueId.getOutputSymbols(),
source.getOutputSymbols(),
false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without setting this flag it directly, it is not set for left join.

Could you check why? The rule is OptimizeDuplicateInsensitiveJoins

@Dith3r Dith3r force-pushed the ke/correletedexist branch 2 times, most recently from caf353c to 9aea658 Compare April 29, 2024 13:31
@Dith3r Dith3r requested a review from sopel39 April 29, 2024 13:32
Optional.empty(),
SINGLE,
join(INNER, builder -> builder
.maySkipOutputDuplicates(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could used the new syntax in TestOptimizeDuplicateInsensitiveJoins

@@ -287,4 +289,39 @@ public void testWithPreexistingMask()
.right(project(ImmutableMap.of("non_null", expression(TRUE)),
values(ImmutableMap.of("a", 0, "mask", 1)))))))));
}

@Test
public void rewritesOnSubqueryWithBoolOr()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: testRewrites...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make naming convention inconsistent. I will make refactor PR outside this one.

if (aggregation.getFilter().isPresent() || aggregation.getMask().isPresent()) {
return Optional.empty();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably could generalize it a bit more, e.g: count with arguments

@Dith3r Dith3r force-pushed the ke/correletedexist branch 2 times, most recently from 3878135 to e7701be Compare May 7, 2024 11:19
Improve performance of correlated exists by replacing count aggregation with bool_or,
which reduced the need to use additional symbol with mask. As a result, aggregation can be
marked as streaming and join can be marked as may skip duplicates.
@sopel39 sopel39 merged commit 7652d3f into trinodb:master May 7, 2024
93 checks passed
@github-actions github-actions bot added this to the 447 milestone May 7, 2024
@Dith3r Dith3r deleted the ke/correletedexist branch May 21, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants