Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out Iceberg splits that do not match predicates not expressible by tuple domain #9830

Conversation

homar
Copy link
Member

@homar homar commented Nov 2, 2021

fixes #9309
fixes #7608
The idea is to push predicate down to split source and not generate splits
that would be filtered out by this predicate.

@cla-bot cla-bot bot added the cla-signed label Nov 2, 2021
@homar homar requested a review from findepi November 2, 2021 11:42
@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch 2 times, most recently from af3f1f3 to c27e6c9 Compare November 2, 2021 14:54
@homar
Copy link
Member Author

homar commented Nov 3, 2021

Failing tests are
TestDruidConnectorTest>AbstractTestQueryFramework.init:97->createQueryRunner:34 » Runtime TestDruidLatestConnectorTest>AbstractTestQueryFramework.init:97->createQueryRunner:36 » Runtime
and the failures don't seem to be related

@homar homar requested a review from losipiuk November 3, 2021 13:25
@homar homar marked this pull request as ready for review November 3, 2021 13:26
@@ -511,7 +511,7 @@ private PlanRoot doPlanQuery()
private void planDistribution(PlanRoot plan)
{
// plan the execution on the active nodes
DistributedExecutionPlanner distributedPlanner = new DistributedExecutionPlanner(splitManager, metadata, dynamicFilterService);
DistributedExecutionPlanner distributedPlanner = new DistributedExecutionPlanner(splitManager, metadata, dynamicFilterService, new TypeAnalyzer(sqlParser, metadata));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject TypeAnalyzer from guice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that but I noticed that in the doPlanQuery method it is created the same way so I thought it is better to be consisted. On the other hand I can replace both occurrences with an injected value.
WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense to change both and not construct TypeAnalyzer manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a prep commit


public LayoutConstraintEvaluator(Metadata metadata, TypeAnalyzer typeAnalyzer, Session session, TypeProvider types, Map<Symbol, ColumnHandle> assignments, Expression expression)
{
this.assignments = assignments;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rnn and ImmutableMap.copyOf (now that the class is public, it's more important)

@@ -2429,9 +2429,7 @@ public void testSplitPruningForFilterOnPartitionColumn()
verifySplitCount("SELECT * FROM " + tableName + " WHERE regionkey < 2", 2);
verifySplitCount("SELECT * FROM " + tableName + " WHERE regionkey < 0", 0);
verifySplitCount("SELECT * FROM " + tableName + " WHERE regionkey > 1 AND regionkey < 4", 2);

// TODO(https://github.com/trinodb/trino/issues/9309) should be 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark your PR as fixing this issue (and #7608 too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I do that? I already have #9309 in the description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "fixes #nnnn" to PR description. then take a look at this section on the sidebar:

image

@@ -158,7 +159,8 @@ public void testQueryDividedIntoSplitsFirstSplitHasRightTime()
null,
new PrometheusTableHandle("default", table.getName()),
null,
(DynamicFilter) null);
(DynamicFilter) null,
null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass ALWAYS_TRUE (aka EMPTY)

(here & below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, again I followed DynamicFiltering example ;)

@@ -203,12 +207,21 @@ private Visitor(
dynamicFilter = dynamicFilterService.createDynamicFilter(session.getQueryId(), dynamicFilters, node.getAssignments(), typeProvider);
}

ConstraintEvaluator constraintEvaluator = ConstraintEvaluator.EMPTY;
Optional<ConstraintEvaluator> existingEvaluator = filter.map(FilterNode::getPredicate)
.filter(expression -> !DynamicFilters.isDynamicFilter(expression))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to disassemble the expression into conjuncts, filtering out dynamic filters

Expression nonDynamicPredicate = [ExpressionUtils.]filterConjuncts(metadata, predicate, expression -> !DynamicFilters.isDynamicFilter(expression));

@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch 2 times, most recently from 7778b5a to 5394f6b Compare November 5, 2021 09:14
@homar
Copy link
Member Author

homar commented Nov 8, 2021

@findepi please take another look ;)

import java.util.Map;
import java.util.Set;

public interface ConstraintEvaluator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some javadoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can give it a try ;)

Copy link
Member

@losipiuk losipiuk Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just:

Represents a constraint passed to {@link ConnectorSplitManager#getSplits}.
If {@link ConstraintEvaluator#isCandidate} returns false given actual partial column values binding, for given split, 
it means split can be safely dropped without processing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i know why we do not pass Constraint into split manager -- we don't want to pass a TupleDomain so that connector doesn't want to intersect DynamicFilter's domain with Constraint's domain.

but then, the two classes: Constraint and ConstraintEvaluator have naturally similar names, and the second looks like the evaluator of the first.

  • i am not sure about the naming. maybe we can find a better name, like TestableConstraint or something
  • should Constraint reuse this new class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @losipiuk ^

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch from 5394f6b to 60c6a8e Compare November 9, 2021 08:34
@homar
Copy link
Member Author

homar commented Nov 9, 2021

@losipiuk @findepi comments addressed please take a look

@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch from 60c6a8e to e3ba940 Compare November 9, 2021 09:34
@losipiuk
Copy link
Member

losipiuk commented Nov 9, 2021

Nothing important

@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch from e3ba940 to 49491a2 Compare November 9, 2021 10:27
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except 49491a2#r745486692

nits

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.trino.sql.planner.iterative.rule;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.trino.sql.planner, as it's going to be used outside of iterative optimizer

{
Map<ColumnHandle, NullableValue> bindings = new HashMap<>();
for (IcebergColumnHandle partitionColumn : identityPartitionColumns) {
Object partitionValue = deserializePartitionValue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump?

@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch from 49491a2 to fc3644d Compare November 9, 2021 11:53
@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch 2 times, most recently from 1ad71f2 to 32dc400 Compare November 10, 2021 11:13
@@ -203,12 +212,19 @@ private Visitor(
dynamicFilter = dynamicFilterService.createDynamicFilter(session.getQueryId(), dynamicFilters, node.getAssignments(), typeProvider);
}

Constraint constraint = filterPredicate
.map(predicate -> filterConjuncts(metadata, predicate, expression -> !DynamicFilters.isDynamicFilter(expression)))
.map(expression -> new LayoutConstraintEvaluator(metadata, typeAnalyzer, session, typeProvider, node.getAssignments(), expression))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you use predicate here again?

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Constraint constraint = filterPredicate
.map(predicate -> filterConjuncts(metadata, predicate, expression -> !DynamicFilters.isDynamicFilter(expression)))
.map(expression -> new LayoutConstraintEvaluator(metadata, typeAnalyzer, session, typeProvider, node.getAssignments(), expression))
.map(evaluator -> new Constraint(TupleDomain.all(), evaluator::isCandidate, evaluator.getArguments()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment
// we are interested only in functional predicate here so we set the summary to ALL.

@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch from 32dc400 to dd64c8f Compare November 10, 2021 14:01
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message nits:
Inject type analyzer int SqlQueryExecution instead of creating it "int" -> "into"
Filter out splits that do not match predicates not expressible by tup… is too long

{
try (SetThreadName ignored = new SetThreadName("Query-%s", stateMachine.getQueryId())) {
this.slug = requireNonNull(slug, "slug is null");
this.metadata = requireNonNull(metadata, "metadata is null");
this.typeOperators = requireNonNull(typeOperators, "typeOperators is null");
this.sqlParser = requireNonNull(sqlParser, "sqlParser is null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sqlParser parameter is unused now, remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used, in line 193. it is passed to analyze method

@@ -167,6 +190,9 @@ public IcebergSplitSource(
continue;
}
}
if (!partitionMatchesConstraint(identityPartitionColumns, partitionValues.get(), constraint)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Supplier's get is always getting called here, which kinda defeats the purpose. You can add an constraint.predicate().isPresent() check first, before calling partitionMatchesConstraint, or pass the entire Supplier to partitionMatchesConstraint rather than calling get here to prolong constructing the Map until actually necessary.

…le domain

fixes trinodb#9309
fixes trinodb#7608
The idea is to push constraint down to split source and not generate splits
that would be filtered out by this constraint's predicate.
@homar homar force-pushed the homar/partition_pruning_for_predicates_not_expressible_by_tupple_domain branch from dd64c8f to a901d2c Compare November 18, 2021 08:48
@homar
Copy link
Member Author

homar commented Nov 18, 2021

@alexjo2144 @losipiuk @findepi comments addressed

@findepi findepi merged commit c973251 into trinodb:master Nov 22, 2021
@findepi findepi mentioned this pull request Nov 22, 2021
12 tasks
@github-actions github-actions bot added this to the 365 milestone Nov 22, 2021
@findepi findepi changed the title Filter out splits that do not match predicates not expressible by tuple domain Filter out Iceberg splits that do not match predicates not expressible by tuple domain Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants