Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property pruner #3750

Merged
merged 28 commits into from
Feb 11, 2022
Merged

Property pruner #3750

merged 28 commits into from
Feb 11, 2022

Conversation

jievince
Copy link
Contributor

@jievince jievince commented Jan 18, 2022

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:
Close #3096.

Description:

How do you solve it?

Collect the attributes used by each plan node from the top down and record them in the PropertyTracker. For a node (AppendVertices, Traverse), they only need to request the storaged attributes recorded in the PropertyTracker (that is, the attributes used by its upper nodes).

Afte property pruner, some plan nodes may need to be deleted.
eg. AppendVertices node could be deleted if all the vertexProps it requests are not used by the upper nodes.

PropertyTrackerVisitor is used to track the vertex/edge props used by the expressions of plan nodes.
Currently, we need to care about TagPropertyExpression and LabelTagProertyExpression for vertex properties, EdgePropertyExpression and AttributeExpression? for edge properties. If these exprs are not found, then InputPropertyExpresson and VariablePropertyExpression are collected.

TODO:

  • Add a method called markTheNodeAsDeleted, which implies the node is useless after the property pruner and should be deleted.
  • At present this PR only prunes vertex attributes, when the PR was almost reviewed, I can add the pruner of edge attribute.
  • Implement ColumnPruner, column pruner should be done before property pruner in the optimizer.

Special notes for your reviewer, ex. impact of this fix, design document, etc:

@jievince jievince added the ready-for-testing PR: ready for the CI test label Jan 18, 2022
@Shylock-Hg
Copy link
Contributor

Any design document about this?

@jievince
Copy link
Contributor Author

Any design document about this?

Sorry, there is no design doc here. Bug after the PR is finished, I will make a brief description.

@jievince jievince force-pushed the property-pruner branch 2 times, most recently from 3f2257b to bddd2a9 Compare January 21, 2022 08:34
@@ -11,6 +11,8 @@
#include "graph/planner/plan/Query.h"
#include "graph/util/ExpressionUtils.h"

DEFINE_bool(enable_opt_collapse_project_rule, true, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disable this rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not disabled, it's just a switch. Actually, all opt rules need a switch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBO should not be designed to be a tradeoff. Why need a switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For test or debug.
  2. RBO rule doesn't always make the plan better.

Comment on lines 1064 to 1066
Status ExpressionUtils::extractPropsFromExprs(const Expression *expr,
PropertyTracker &propsUsed,
const graph::QueryContext *qctx,
GraphSpaceID spaceID,
const std::string &entityAlias) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a more generic interface for utility functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Could you have a explain?

@@ -223,6 +225,17 @@ class Expression {

std::ostream& operator<<(std::ostream& os, Expression::Kind kind);

struct PropertyTracker {
std::unordered_map<std::string, std::unordered_map<TagID, std::unordered_set<std::string>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use TagID rather than TagName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the plan node such as AppendVertices or Traverse 's data membervertexprops_ store the tagId.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can materialize TagID before you pass it to Traverse or AppendVertices instead of doing it in a visitor, which is debug friendly as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both are ok, other visitors like DeducePropsVisitor has used tagId:

void DeducePropsVisitor::visit(LabelTagPropertyExpression *expr) {
  auto status = qctx_->schemaMng()->toTagID(space_, expr->sym());

@jievince jievince force-pushed the property-pruner branch 2 times, most recently from 39703ff to c116258 Compare January 24, 2022 06:19
@jievince jievince marked this pull request as ready for review January 24, 2022 06:53
@jievince jievince changed the title [WIP]Property pruner Property pruner Jan 24, 2022
propsUsed_.colsSet.emplace(colName);
}

// void PropertyTrackerVisitor::visit(AttributeExpression *expr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why disable AttributeExpression

Copy link
Contributor Author

@jievince jievince Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the TODO of the desc of this PR:

At present, this PR only prunes vertex attributes, when the PR was almost reviewed, I can add the pruner of edge attribute.

Copy link
Contributor

@CPWstatic CPWstatic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others in this pr are all looks good to me. Please move on it. Look forward to the effect of this pr.

auto rootGroup = std::move(status).value();
auto spaceID = qctx->rctx()->session()->space().id;

// auto status = preprocess(root, qctx, spaceID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should figure out the relation between prepare and preprocess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think preprocess is operated on a raw plan, while prepare which invoked convertToGroup is to convert the raw plan to a group, which is used in the exploration phase.

The 3 phase of optimizer of TIDB:

// Phase 1: Preprocessing

// The target of this phase is to preprocess the plan tree by some heuristic
// rules which should always be beneficial, for example Column Pruning.

//------------------------------------------------------------------------------
// Phase 2: Exploration
//------------------------------------------------------------------------------
//
// The target of this phase is to explore all the logically equivalent
// expressions by exploring all the equivalent group expressions of each group.

//------------------------------------------------------------------------------
// Phase 3: Implementation
//------------------------------------------------------------------------------
//
// The target of this phase is to search the best physical plan for a Group
// which satisfies a certain required physical property.

@@ -223,6 +224,17 @@ class Expression {

std::ostream& operator<<(std::ostream& os, Expression::Kind kind);

struct PropertyTracker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should not be part of this file i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will find a better place to put it into.

@@ -379,6 +379,22 @@ void PlanNode::updateSymbols() {
}
}

Status PlanNode::pruneProperties(PropertyTracker& propsUsed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do this by using the visitor pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Putting the method pruneProperties to the Plan node is a little intrusive. I will use the visitor pattern.

@@ -1,3 +1,4 @@
@wang
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will delete it.

# Below scenario is not suppoted for the execution plan has a scan.
When executing query:
When profiling query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to add a execution plan check here

@@ -26,7 +27,23 @@ Feature: Multi Query Parts
| "Tim Duncan" | "Boris Diaw" | "Spurs" |
| "Tim Duncan" | "Boris Diaw" | "Suns" |
| "Tim Duncan" | "Boris Diaw" | "Tim Duncan" |
When executing query:
# And the execution plan should be:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something wrong in the check mechanism of operator info of tck currently.

@jievince
Copy link
Contributor Author

jievince commented Jan 28, 2022

The others in this pr are all looks good to me. Please move on it. Look forward to the effect of this pr.

Ok, fine!

@jievince jievince force-pushed the property-pruner branch 3 times, most recently from 3d2c42c to b573dcb Compare February 10, 2022 09:57
@@ -20,6 +21,8 @@ using nebula::graph::QueryContext;
using nebula::graph::Select;
using nebula::graph::SingleDependencyNode;

DEFINE_bool(enable_optimizer_property_pruner_rule, true, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should reuse enable_optimizer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could enable_optimizer be able to control whether to enable a single rule?

Copy link
Contributor

@Shylock-Hg Shylock-Hg Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't provide a way to control single optimization. And in my mind, enable_optimizer must control all optimizations include this of course.

@yixinglu yixinglu merged commit 25a0621 into vesoft-inc:master Feb 11, 2022
liwenhui-soul pushed a commit to liwenhui-soul/nebula that referenced this pull request Feb 15, 2022
* property pruner

* postprocess works.

* add Travese::pruneProperties

* make vertexprops of travese be null when node alias is not used

* fix extractPropsFromExps for tagPropExpr and edgePropexpr, vFilter and eFilter

* add PropertyTracker::update for project node

* add DeduceMatchPropsVisitor and FLAGS_enable_opt_collapse_project_rule

* add Filter::pruneProperties and only 3 tck cases not passed

* Do not do propsUsed.colsSet.erase(it)

* revert labeltagpropexpr and pass all tck

* add hasAlias, add deduce types for id, src, dst func

* rename visitor

* ingore func id, src, dst, type, typeid, rank, hash in property tracker visitor

* add Aggregate::pruneProperties and store the alias of id/src/dst... to propsused

* format tck

* remove some unusedless headers

* add tck for plan

* rename has1, has2, has3

* add flag_enable_optimizer_property_pruner_rule and support prune edge props...

* add kUnknownEdgeType

* add PrunePropertiesVisitor

* remove PlanNode::pruneProperties, and move PropertyTracker to PropertyTrackerVisitor

* add PrunePropertiesRule.feature

* add markDeleted interface

* remove unused code

* fix header macro

* update tck

Co-authored-by: kyle.cao <kyle.cao@vesoft.com>
@czpmango czpmango mentioned this pull request Feb 23, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Properties reduction (Remove unnecessary properties, only fetch required properties)
6 participants