Skip to content

Commit

Permalink
feat(interactive): support project properties of a path (alibaba#3213)
Browse files Browse the repository at this point in the history
<!--
Thanks for your contribution! please review
https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before
opening an issue.
-->

<!-- Please give a short brief about these changes. -->

As titled. We support to project property of a `path`, i.e., we project
the property of each element in `path`.

e.g., on modern graph:
```
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'ALL_V').values("name")
==>[marko, vadas]
==>[marko, josh]
```

<!-- Are there any issues opened that will be resolved by merging this
change? -->

Fixes alibaba#3199

test for multi-match logical plan and physical plan

Committed-by: bingqing.lbq from Dev container

[Fix]
1. Fix the bug of multiple pattern matching
2. Add more test cases for pattern matching

[Test]
1. Edit testcases
2. Fix some bugs
3. Now the new multiple match implementation with dummy source can pass all the tests in the rust side

minor: test for multi-source-multi-match logical plan and physical plan

[CI Tests] more multi source related ci tests

[Bug Fix]
1. Edit methods of get subplans and get branch plans, now it can support nested branch plans
2. Introduce a new Branch Logical Operator for helping handle branches

[Bug Fix and Testcases]
1. Fix bugs in `get_merge_node` and `get_branch_node` method when facing more comlicated logical plan, now it will uses more strict condition for final merge/first branch checking
2. Fix a bug in `append_branch_plan`, when branch plan having a node with id already exist in the original plan, now it will not cover the existing node, but it will now merge the children of the two nodes. It solves the problem in finding subplans that two branch nodes overlap
3. Add more test cases to validate the new `subplan` and `get_branch_plans` method

[Bug Fix and Testcases]

1. Add 6 types logical plans for testing
1.1 Verify the correctness `get_merge_node`, `get_branch_node`, `subplan`, `get_branch_plans` with these logical plans
1.2 Verify whether these logical plan can converted to expected physical plan
1.3 Verify whether the query based on these logical plans can generate expected results

2. Use flow algorithm in `get_merge_node` and `get_branch_node`, greatly improve the efficiency
2.1 Implement a Fraction abstraction for the precision during flow comparison

3. Solve a problem in `append_branch_plans`, now it supports to append branch plans which having nodes alreading existing in the original plan

4. Completely rewrite the logic of `get_merge_node`, `get_branch_node`, `subplan`, `get_branch_plans
4.1 Now it in theory can support arbitrary DAG logical plan as long as it is reasonable

5. Solve a problem in `extract_subplans`, now the last node of a subplan can be a merge node

6. Make the code much safer that now it will check the existence of merge node subplans much more strictly
6.1 Remove `expect`, instead throws None or Error

7. Add more comments which suggests the logic for handling nested branches

import a third-party fraction lib to replace the self-written one

[GIE Compiler] support optional match in compiler

add FilterIntoJoin Rule

[GIE Compiler] support option match to left outer join

[GIE Compiler] add 'NotExistToAntiJoinRule' to convert not exist subquery to anti join

[GIE Compiler] fix bugs in ic queries

[GIE Compiler] fix unit tests

[GIE Compiler] support IS_NULL and IS_NOT_NULL in cypher queries

[GIE Compiler] minor fix

[GIE Compiler] minor fix

[GIE Compiler] remove IS_NULL

[GIE Compiler] remove ffi build unit test from compiler

[GIE Compiler] add doc

[GIE Compiler] support IS_NULL and IS_NOT_NULL in cypher queries

[GIE Compiler] minor fix

[GIE Compiler] support ListLiteral in cypher queries

[IR Core] add First in FfiAggOpt

[IR Runtime] support VarMap in Project

[GIE Compiler] support types : 'GraphPathType' and 'List<Any>'

[GIE Compiler] minor fix

[GIE Compiler] support extract operator in compiler

[GIE Compiler] fix bugs

[GIE Compiler] minor fix

add support for ic1

todo ic10

fix ic10 support

modify ic1 cypher

stash

todo: fix correctness

fix ic10

fix
  • Loading branch information
BingqingLyu authored and zhanglei1949 committed Sep 13, 2023
1 parent a805363 commit a101f1f
Show file tree
Hide file tree
Showing 68 changed files with 3,578 additions and 419 deletions.
21 changes: 11 additions & 10 deletions .github/workflows/hqps-db-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,19 +112,20 @@ jobs:
echo "graph.store: exp" >> /tmp/ir.compiler.properties
echo "graph.planner.is.on: true" >> /tmp/ir.compiler.properties
echo "graph.planner.opt: RBO" >> /tmp/ir.compiler.properties
echo "graph.planner.rules: FilterMatchRule" >> /tmp/ir.compiler.properties
echo "graph.planner.rules: FilterMatchRule,NotExistToAntiJoinRule," >> /tmp/ir.compiler.properties
cd ${GITHUB_WORKSPACE}/flex/bin
for i in 2 3 5 6 8 9 11 12;
do
cmd="./load_plan_and_gen.sh -e=hqps -i=../resources/queries/ic/adhoc/ic${i}_adhoc.cypher -w=/tmp/codgen/"
cmd=${cmd}" -o=/tmp/plugin --ir_conf=/tmp/ir.compiler.properties "
cmd=${cmd}" --graph_schema_path=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/ldbc_schema_csr_ic.json"
cmd=${cmd}" --gie_home=${GIE_HOME}"
echo $cmd
eval ${cmd}
done
# for i in 1 2 3 5 6 7 8 9 10 11 12;
for i in 13;
do
cmd="./load_plan_and_gen.sh -e=hqps -i=../resources/queries/ic/adhoc/ic${i}_adhoc.cypher -w=/tmp/codgen/"
cmd=${cmd}" -o=/tmp/plugin --ir_conf=/tmp/ir.compiler.properties "
cmd=${cmd}" --graph_schema_path=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/ldbc_schema_csr_ic.json"
cmd=${cmd}" --gie_home=${GIE_HOME}"
echo $cmd
eval ${cmd}
done

for i in 1 2 3 4 5 6 7 8 9;
do
Expand Down
11 changes: 10 additions & 1 deletion docs/interactive_engine/tinkerpop/supported_gremlin_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ The following steps are extended to denote more complex situations.
In Graph querying, expanding a multiple-hops path from a starting point is called `PathExpand`, which is commonly used in graph scenarios. In addition, there are different requirements for expanding strategies in different scenarios, i.e. it is required to output a simple path or all vertices explored along the expanding path. We introduce the with()-step to configure the corresponding behaviors of the `PathExpand`-step.
#### out()
Expand a multiple-hops path along the outgoing edges, which length is within the given range.
Expand a multiple-hops path along the outgoing edges, which length is within the given range.
Parameters: </br>
lengthRange - the lower and the upper bounds of the path length, </br> edgeLabels - the edge labels to traverse.
Expand All @@ -603,6 +603,9 @@ g.V().out("1..10", "knows")
# expand hops within the range of [1, 10) along the outgoing edges which label is `knows` or `created`,
# vertices can be duplicated and only the end vertex should be kept
g.V().out("1..10", "knows", "created")
# expand hops within the range of [1, 10) along the outgoing edges,
# and project the properties "id" and "name" of every vertex along the path
g.V().out("1..10").with('RESULT_OPT', 'ALL_V').values("name")
```
Running Example:
```bash
Expand All @@ -615,6 +618,12 @@ gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'ALL_V_E')
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'END_V').endV()
==>v[2]
==>v[4]
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'ALL_V').values("name")
==>[marko, vadas]
==>[marko, josh]
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'ALL_V').valueMap("id","name")
==>{id=[[1, 2]], name=[[marko, vadas]]}
==>{id=[[1, 4]], name=[[marko, josh]]}
```
#### in()
Expand a multiple-hops path along the incoming edges, which length is within the given range.
Expand Down
1 change: 1 addition & 0 deletions flex/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ set(DEFAULT_BUILD_TYPE "Release")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17 -mno-avx512f -fPIC")
set(CMAKE_CXX_FLAGS_DEBUG "-g3 -O0")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -g3 -flto")


add_compile_definitions(FLEX_VERSION="${FLEX_VERSION}")
Expand Down
4 changes: 4 additions & 0 deletions flex/codegen/src/building_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ struct TagIndMapping {
return tag_ind_2_tag_ids_;
}

const std::vector<int32_t>& GetTagId2TagInds() const {
return tag_id_2_tag_inds_;
}

// convert tag_ind (us) to tag ids
std::vector<int32_t> tag_ind_2_tag_ids_;
// convert tag ids(pb) to tag_inds
Expand Down
48 changes: 45 additions & 3 deletions flex/codegen/src/codegen_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,14 +203,56 @@ static codegen::ParamConst variable_to_param_const(const common::Variable& var,
param_const.var_name = var.property().key().name();
param_const.type =
common_data_type_pb_2_data_type(var.node_type().data_type());
} else {
param_const.var_name = ctx.GetNextVarName();
param_const.type = codegen::DataType::kVertexId;
} else if (var.has_tag()) {
// check is vertex or is edge from node_type
if (var.has_node_type()) {
auto node_type = var.node_type();
param_const.var_name = ctx.GetNextVarName();
if (node_type.type_case() == common::IrDataType::kDataType) {
param_const.type =
common_data_type_pb_2_data_type(node_type.data_type());
} else {
auto graph_type = node_type.graph_type();
if (graph_type.element_opt() ==
common::GraphDataType::GraphElementOpt::
GraphDataType_GraphElementOpt_VERTEX) {
param_const.type = codegen::DataType::kVertexId;
} else if (graph_type.element_opt() ==
common::GraphDataType::GraphElementOpt::
GraphDataType_GraphElementOpt_EDGE) {
param_const.type = codegen::DataType::kEdgeId;
} else {
LOG(FATAL) << "Unexpect graph type";
}
}
} else {
LOG(FATAL)
<< "Node type is not given when converting variable to param const";
}
}

return param_const;
}

static std::string interval_to_str(const common::Extract::Interval& interval) {
switch (interval) {
case common::Extract::Interval::Extract_Interval_YEAR:
return "Interval::YEAR";
case common::Extract::Interval::Extract_Interval_MONTH:
return "Interval::MONTH";
case common::Extract::Interval::Extract_Interval_DAY:
return "Interval::DAY";
case common::Extract::Interval::Extract_Interval_HOUR:
return "Interval::HOUR";
case common::Extract::Interval::Extract_Interval_MINUTE:
return "Interval::MINUTE";
case common::Extract::Interval::Extract_Interval_SECOND:
return "Interval::SECOND";
default:
LOG(FATAL) << "Unexpected interval" << interval;
}
}

} // namespace gs

#endif // CODEGEN_SRC_CODEGEN_UTILS_H_
52 changes: 44 additions & 8 deletions flex/codegen/src/graph_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ enum class DataType {
kInt32Array = 6,
kBoolean = 7,
kVertexId = 8,
kEdgeId = 9,
kLength = 10,
kTime = 11,
kDate = 12,
kDateTime = 13,
};

// a parameter const, the real data will be feed at runtime.
Expand Down Expand Up @@ -72,14 +77,18 @@ static codegen::DataType common_data_type_pb_2_data_type(
return codegen::DataType::kInt32Array;
case common::DataType::BOOLEAN:
return codegen::DataType::kBoolean;
case common::DataType::DATE:
return codegen::DataType::kDate;
default:
// LOG(FATAL) << "unknown data type";
throw std::runtime_error("unknown data type" +
std::to_string(static_cast<int>(data_type)));
throw std::runtime_error(
"unknown data type when converting common_data_type to inner data "
"type:" +
std::to_string(static_cast<int>(data_type)));
}
}

static std::string common_data_type_pb_2_str(
static std::string single_common_data_type_pb_2_str(
const common::DataType& data_type) {
switch (data_type) {
case common::DataType::BOOLEAN:
Expand All @@ -96,14 +105,34 @@ static std::string common_data_type_pb_2_str(
return "std::vector<int64_t>";
case common::DataType::INT32_ARRAY:
return "std::vector<int32_t>";
case common::DataType::DATE:
return "Date";
default:
// LOG(FATAL) << "unknown data type";
// return "";
throw std::runtime_error("unknown data type" +
std::to_string(static_cast<int>(data_type)));
throw std::runtime_error(
"unknown data type when convert common data type to string:" +
std::to_string(static_cast<int>(data_type)));
}
}

static std::string common_data_type_pb_2_str(
const std::vector<common::DataType>& data_types) {
std::stringstream ss;
if (data_types.size() == 1) {
return single_common_data_type_pb_2_str(data_types[0]);
}
ss << "std::tuple<";
for (auto i = 0; i < data_types.size(); ++i) {
ss << single_common_data_type_pb_2_str(data_types[i]);
if (i + 1 < data_types.size()) {
ss << ", ";
}
}
ss << ">;";
return ss.str();
}

static std::string arith_to_str(const common::Arithmetic& arith_type) {
switch (arith_type) {
case common::Arithmetic::ADD:
Expand Down Expand Up @@ -158,10 +187,17 @@ static std::string data_type_2_string(const codegen::DataType& data_type) {
return "bool";
case codegen::DataType::kVertexId:
return VERTEX_ID_T;
case codegen::DataType::kLength:
return LENGTH_KEY_T;
case codegen::DataType::kEdgeId:
return EDGE_ID_T;
case codegen::DataType::kDate:
return "Date";
default:
// LOG(FATAL) << "unknown data type" << static_cast<int>(data_type);
throw std::runtime_error("unknown data type" +
std::to_string(static_cast<int>(data_type)));
throw std::runtime_error(
"unknown data type when convert inner data_type to string: " +
std::to_string(static_cast<int>(data_type)));
}
}

Expand All @@ -180,7 +216,7 @@ static std::string decode_type_as_str(const codegen::DataType& data_type) {
return "get_bool()";
default:
// LOG(FATAL) << "unknown data type" << static_cast<int>(data_type);
throw std::runtime_error("unknown data type" +
throw std::runtime_error("unknown data type when decode type as str: " +
std::to_string(static_cast<int>(data_type)));
}
}
Expand Down
Loading

0 comments on commit a101f1f

Please sign in to comment.