Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recursive CTEs #4250

Merged
merged 2 commits into from Aug 8, 2020
Merged

Support recursive CTEs #4250

merged 2 commits into from Aug 8, 2020

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Jun 26, 2020

Adds support for recursive common table expression (WITH RECURSIVE).
Single-element recursive cycles are allowed.

Issue: #1122

@cla-bot cla-bot bot added the cla-signed label Jun 26, 2020
@kasiafi kasiafi added the WIP label Jun 26, 2020
@kasiafi kasiafi force-pushed the 086RECURSIVEactual branch 10 times, most recently from d7f3e57 to a95e324 Compare July 2, 2020 20:19
@kasiafi kasiafi removed the WIP label Jul 2, 2020
@kasiafi kasiafi force-pushed the 086RECURSIVEactual branch 4 times, most recently from 3579519 to d9b6ccd Compare July 8, 2020 16:53
@martint martint self-requested a review July 20, 2020 18:40
@kasiafi kasiafi force-pushed the 086RECURSIVEactual branch 8 times, most recently from 46717d9 to 6175214 Compare July 26, 2020 10:22
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. Still reviewing.

if (!typeCoercion.canCoerce(stepFieldTypes.get(i), anchorFieldTypes.get(i))) {
throw semanticException(
TYPE_MISMATCH,
step,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement for the future: use the corresponding SELECT expression as the "node" in the exception so that error highlighting points at the right place. It won't be trivial to do, so let's defer it (add a TODO)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of the first mismatching column is included in the error message. It is not always possible to highlight the exact select expression, because it might be e.g. a "row all fields" expression. I'm not sure what I should write in the TODO.

analysis.setScope(withQuery.getQuery(), aliasedAnchorScope);
analysis.registerExpandableQuery(withQuery.getQuery(), recursiveReference);
withScopeBuilder.withNamedQuery(name, withQuery);
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The many levels of nesting and this continue here make it hard to follow the flow of the code. One possible way of reorganizing this is to move this whole section to a new method and have it return "true" if it was able to handle the recursive query. Then, the code below would do:

if (!handled) {
   // 2. 
   ...
   // process plain query
   ...
}

Let me play with the code a bit to see if I can come up with something better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, instead of splitting the method into two sections based on with.isRecursive(), it seems we should be able to structure in a similar manner to this and avoid a bit of duplication:

With with = node.getWith().get();
Scope.Builder withScopeBuilder = scopeBuilder(scope);

for (WithQuery withQuery : with.getQueries()) {
    String name = withQuery.getName().getValue().toLowerCase(ENGLISH);
    if (withScopeBuilder.containsNamedQuery(name)) {
        throw semanticException(DUPLICATE_NAMED_QUERY, withQuery, "WITH query name '%s' specified more than once", name);
    }

    boolean isRecursive = false;
    if (with.isRecursive()) {
        isRecursive = tryProcessRecursiveQuery(...);
    }

    if (!isRecursive) {
        // check if all or none of the columns are explicitly alias
        Query query = withQuery.getQuery();
        if (withQuery.getColumnNames().isPresent()) {
            validateColumnAliases(withQuery.getColumnNames().get(), analysis.getOutputDescriptor(query).getVisibleFieldCount());
        }

        process(query, withScopeBuilder.build());
        withScopeBuilder.withNamedQuery(name, withQuery);
    }
}

Scope withScope = withScopeBuilder.build();
analysis.setScope(with, withScope);
return withScope;

Possibly, even this for the inner condition, but I'm not a big fan of the side-effect in the if condition:

...
if (!with.isRecursive() || !tryProcessRecursiveQuery(...)) {
   ...
}
...

@kasiafi kasiafi force-pushed the 086RECURSIVEactual branch 3 times, most recently from 5f359ad to 6f57388 Compare August 3, 2020 18:14
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can you rebased and squash the commits?

@kasiafi kasiafi force-pushed the 086RECURSIVEactual branch 4 times, most recently from b29e506 to fc3a1d9 Compare August 5, 2020 20:33
PlanNode anchorPlanRoot = new ProjectNode(idAllocator.getNextId(), anchorPlan.getRoot(), Assignments.identity(anchorOutputs));

anchorPlan = new RelationPlan(anchorPlanRoot, analysis.getScope(query), anchorOutputs, outerContext);
NodeAndMappings prunedAnchorPlan = pruneInvisibleFields(anchorPlan, idAllocator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore? With the field mappings as a reference, it doesn't matter how many fields the underlying plan node produces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to prune anchor outputs. If we leave those extra symbols in the anchor plan, they will "leak" to the recursion step plan. Later, when anchor plan is replaced with step plan, they will become orphaned.

import static io.prestosql.sql.planner.plan.AggregationNode.groupingSets;
import static java.util.Objects.requireNonNull;

public abstract class AbstractSymbolMapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using inheritance to share code is generally not a good approach. It introduces issues of poor encapsulation, unclear thread-safety semantics (for code that can be accessed concurrently), etc. It's better to do it via delegation. In this case, you could make SymbolMapper take an object during construction that is responsible for doing the primordial mapping. You'd customize that for each of the two use cases we need here.

@kasiafi kasiafi force-pushed the 086RECURSIVEactual branch 2 times, most recently from 1eda6b5 to 6e48c45 Compare August 7, 2020 12:57
Use collector `toImmutableMap` instead of `Collectors.toMap`
to avoid shuffling the collected assignments.
This change restores the state that the visitor methods
don't reorder outputs of ProjectNode and ApplyNode,
which changed when a check for duplicate assignments was added.
@kasiafi
Copy link
Member Author

kasiafi commented Aug 7, 2020

Updated and squashed.

@martint martint merged commit 6b01cc1 into trinodb:master Aug 8, 2020
@martint martint added this to the 340 milestone Aug 8, 2020
@martint martint mentioned this pull request Aug 8, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants