Skip to content

Commit

Permalink
feat(interactive): support project properties of a path (alibaba#3213)
Browse files Browse the repository at this point in the history
<!--
Thanks for your contribution! please review
https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before
opening an issue.
-->

<!-- Please give a short brief about these changes. -->

As titled. We support to project property of a `path`, i.e., we project
the property of each element in `path`.

e.g., on modern graph:
```
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'ALL_V').values("name")
==>[marko, vadas]
==>[marko, josh]
```

<!-- Are there any issues opened that will be resolved by merging this
change? -->

Fixes alibaba#3199

test for multi-match logical plan and physical plan

Committed-by: bingqing.lbq from Dev container

[Fix]
1. Fix the bug of multiple pattern matching
2. Add more test cases for pattern matching

[Test]
1. Edit testcases
2. Fix some bugs
3. Now the new multiple match implementation with dummy source can pass all the tests in the rust side

minor: test for multi-source-multi-match logical plan and physical plan

[CI Tests] more multi source related ci tests

[Bug Fix]
1. Edit methods of get subplans and get branch plans, now it can support nested branch plans
2. Introduce a new Branch Logical Operator for helping handle branches

[Bug Fix and Testcases]
1. Fix bugs in `get_merge_node` and `get_branch_node` method when facing more comlicated logical plan, now it will uses more strict condition for final merge/first branch checking
2. Fix a bug in `append_branch_plan`, when branch plan having a node with id already exist in the original plan, now it will not cover the existing node, but it will now merge the children of the two nodes. It solves the problem in finding subplans that two branch nodes overlap
3. Add more test cases to validate the new `subplan` and `get_branch_plans` method

[Bug Fix and Testcases]

1. Add 6 types logical plans for testing
1.1 Verify the correctness `get_merge_node`, `get_branch_node`, `subplan`, `get_branch_plans` with these logical plans
1.2 Verify whether these logical plan can converted to expected physical plan
1.3 Verify whether the query based on these logical plans can generate expected results

2. Use flow algorithm in `get_merge_node` and `get_branch_node`, greatly improve the efficiency
2.1 Implement a Fraction abstraction for the precision during flow comparison

3. Solve a problem in `append_branch_plans`, now it supports to append branch plans which having nodes alreading existing in the original plan

4. Completely rewrite the logic of `get_merge_node`, `get_branch_node`, `subplan`, `get_branch_plans
4.1 Now it in theory can support arbitrary DAG logical plan as long as it is reasonable

5. Solve a problem in `extract_subplans`, now the last node of a subplan can be a merge node

6. Make the code much safer that now it will check the existence of merge node subplans much more strictly
6.1 Remove `expect`, instead throws None or Error

7. Add more comments which suggests the logic for handling nested branches

import a third-party fraction lib to replace the self-written one

[GIE Compiler] support optional match in compiler

add FilterIntoJoin Rule

[GIE Compiler] support option match to left outer join

[GIE Compiler] add 'NotExistToAntiJoinRule' to convert not exist subquery to anti join

[GIE Compiler] fix bugs in ic queries

[GIE Compiler] fix unit tests

[GIE Compiler] support IS_NULL and IS_NOT_NULL in cypher queries

[GIE Compiler] minor fix

[GIE Compiler] minor fix

[GIE Compiler] remove IS_NULL

[GIE Compiler] remove ffi build unit test from compiler

[GIE Compiler] add doc

[GIE Compiler] support IS_NULL and IS_NOT_NULL in cypher queries

[GIE Compiler] minor fix

[GIE Compiler] support ListLiteral in cypher queries

[IR Core] add First in FfiAggOpt

[IR Runtime] support VarMap in Project

[GIE Compiler] support types : 'GraphPathType' and 'List<Any>'

[GIE Compiler] minor fix

[GIE Compiler] support extract operator in compiler

[GIE Compiler] fix bugs

[GIE Compiler] minor fix

add support for ic1

todo ic10

fix ic10 support

modify ic1 cypher

stash

todo: fix correctness

fix ic10

fix

rebase main

fixing rebase

todo: add e2d ci

adding 4,7,10 tet

check ci

format and refactor

format

todo: reset gstest

some refactor and format

disable ic8 test

fixup

fixup
  • Loading branch information
BingqingLyu authored and zhanglei1949 committed Sep 18, 2023
1 parent 11d6ae6 commit 6ef4128
Show file tree
Hide file tree
Showing 56 changed files with 1,951 additions and 356 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/hqps-db-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ jobs:
GS_TEST_DIR: ${{ github.workspace }}/gstest
run: |
# download dataset
git clone -b master --single-branch --depth=1 https://github.com/Graphscope/gstest.git ${GS_TEST_DIR}
git clone -b master --single-branch --depth=1 https://github.com/GraphScope/gstest.git ${GS_TEST_DIR}
- name: Sample Query test
env:
Expand All @@ -112,18 +112,18 @@ jobs:
echo "graph.store: exp" >> /tmp/ir.compiler.properties
echo "graph.planner.is.on: true" >> /tmp/ir.compiler.properties
echo "graph.planner.opt: RBO" >> /tmp/ir.compiler.properties
echo "graph.planner.rules: FilterMatchRule" >> /tmp/ir.compiler.properties
echo "graph.planner.rules: FilterMatchRule,NotMatchToAntiJoinRule" >> /tmp/ir.compiler.properties
cd ${GITHUB_WORKSPACE}/flex/bin
for i in 2 3 5 6 8 9 11 12;
for i in 1 2 3 4 5 6 7 8 9 10 11 12;
do
cmd="./load_plan_and_gen.sh -e=hqps -i=../resources/queries/ic/adhoc/ic${i}_adhoc.cypher -w=/tmp/codgen/"
cmd=${cmd}" -o=/tmp/plugin --ir_conf=/tmp/ir.compiler.properties "
cmd=${cmd}" --graph_schema_path=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/ldbc_schema_csr_ic.json"
cmd=${cmd}" --gie_home=${GIE_HOME}"
echo $cmd
eval ${cmd}
eval ${cmd}
done
for i in 1 2 3 4 5 6 7 8 9;
Expand Down
101 changes: 54 additions & 47 deletions flex/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
cmake_minimum_required (VERSION 3.5)
cmake_minimum_required(VERSION 3.5)

file(READ ${CMAKE_CURRENT_SOURCE_DIR}/../VERSION FLEX_VERSION)

# Strip trailing newline
string(REGEX REPLACE "\n$" "" FLEX_VERSION "${FLEX_VERSION}")

project (
Flex
VERSION ${FLEX_VERSION}
LANGUAGES CXX)

project(
Flex
VERSION ${FLEX_VERSION}
LANGUAGES CXX)

option(BUILD_HQPS "Whether to build HighQPS Engine" ON)
option(BUILD_TEST "Whether to build test" ON)
Expand All @@ -21,14 +21,13 @@ set(DEFAULT_BUILD_TYPE "Release")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17 -mno-avx512f -fPIC")
set(CMAKE_CXX_FLAGS_DEBUG "-g3 -O0")


add_compile_definitions(FLEX_VERSION="${FLEX_VERSION}")

if (APPLE)
if(APPLE)
set(CMAKE_MACOSX_RPATH ON)
else ()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fopenmp -Werror -Wl,-rpath,$ORIGIN")
endif ()
endif()

find_package(MPI REQUIRED)
include_directories(SYSTEM ${MPI_CXX_INCLUDE_PATH})
Expand All @@ -43,52 +42,59 @@ find_package(Threads REQUIRED)

# find glog---------------------------------------------------------------------
include("cmake/FindGlog.cmake")
if (NOT GLOG_FOUND)

if(NOT GLOG_FOUND)
message(FATAL_ERROR "glog not found, please install the glog library")
else ()
else()
include_directories(SYSTEM ${GLOG_INCLUDE_DIRS})
endif ()
endif()

# find gflags-------------------------------------------------------------------
include("cmake/FindGFlags.cmake")
if (NOT GFLAGS_FOUND)

if(NOT GFLAGS_FOUND)
message(STATUS "gflags not found, build without gflags")
else ()
else()
include_directories(SYSTEM ${GFLAGS_INCLUDE_DIRS})
endif ()
endif()

#find boost----------------------------------------------------------------------
# find boost----------------------------------------------------------------------
find_package(Boost REQUIRED COMPONENTS system filesystem
# required by folly
context program_options regex thread)

#find arrow----------------------------------------------------------------------
# required by folly
context program_options regex thread)

# find arrow----------------------------------------------------------------------
include("cmake/FindArrow.cmake")
if (NOT ARROW_FOUND)

if(NOT ARROW_FOUND)
message(FATAL_ERROR "arrow not found, please install the arrow library")
else ()
else()
include_directories(SYSTEM ${ARROW_INCLUDE_DIRS})
if (TARGET arrow_shared)

if(TARGET arrow_shared)
set(ARROW_SHARED_LIB arrow_shared)
endif()
if (TARGET arrow_static)

if(TARGET arrow_static)
set(ARROW_STATIC_LIB arrow_static)
endif()
endif ()
endif()

# Find Doxygen
if (BUILD_DOC)
if(BUILD_DOC)
find_package(Doxygen)

# Add a target to generate the documentation
if(DOXYGEN_FOUND)
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/docs/Doxyfile.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile)
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_target(doc ALL
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
WORKING_DIRECTORY ..
COMMENT "Generating API documentation with Doxygen"
VERBATIM)
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
WORKING_DIRECTORY ..
COMMENT "Generating API documentation with Doxygen"
VERBATIM)
endif(DOXYGEN_FOUND)
endif()

Expand All @@ -97,31 +103,33 @@ add_subdirectory(codegen)
add_subdirectory(storages)
add_subdirectory(engines)
add_subdirectory(bin)
if (BUILD_TEST)
add_subdirectory(tests)
endif()

if(BUILD_TEST)
add_subdirectory(tests)
endif()

file(GLOB_RECURSE FILES_NEED_LINT
"engines/*.cc"
"engines/*.h"
"bin/*.cc"
"storages/*.h"
"storages/*.cc"
"test/*.h"
"test/*.cc"
"third_pary/*.h"
"third_pary/*.cc" EXCEPT "*.act.h" "*.actg.h" "*.autogen.h" "*.autogen.cc")
"engines/*.cc"
"engines/*.h"
"bin/*.cc"
"storages/*.h"
"storages/*.cc"
"test/*.h"
"test/*.cc"
"third_pary/*.h"
"third_pary/*.cc" EXCEPT "*.act.h" "*.actg.h" "*.autogen.h" "*.autogen.cc")
list(FILTER FILES_NEED_LINT EXCLUDE REGEX ".*\.act.h$|.*\.actg.h$|.*\.autogen.h$|.*\.autogen.cc$")

# gsa_clformat
add_custom_target(flex_clformat
COMMAND clang-format --style=file -i ${FILES_NEED_LINT}
COMMENT "Running clang-format, using clang-format-8 from https://github.com/muttleyxd/clang-tools-static-binaries/releases"
VERBATIM)

if (NOT DEFINED CPACK_PACKAGE_NAME)
set(CPACK_PACKAGE_NAME "graphscope_flex")
endif ()
if(NOT DEFINED CPACK_PACKAGE_NAME)
set(CPACK_PACKAGE_NAME "graphscope_flex")
endif()

set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "Flex module of GraphScope")
set(CPACK_PACKAGE_VENDOR "GraphScope")
set(CPACK_PACKAGE_VERSION ${FLEX_VERSION})
Expand All @@ -133,8 +141,7 @@ set(CPACK_DEBIAN_FILE_NAME DEB-DEFAULT)
set(CPACK_COMPONENTS_GROUPING ALL_COMPONENTS_IN_ONE)
set(CPACK_DEB_COMPONENT_INSTALL YES)


#install CMakeLists.txt.template to resources/
# install CMakeLists.txt.template to resources/
install(FILES resources/hqps/CMakeLists.txt.template DESTINATION lib/flex/)

include(CPack)
2 changes: 1 addition & 1 deletion flex/codegen/gen_code_from_plan.cc
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ void deserialize_plan_and_gen_hqps(const std::string& input_file_path,
auto stream = std::istringstream(content_str);
CHECK(plan_pb.ParseFromArray(content_str.data(), content_str.size()));
LOG(INFO) << "deserilized plan size : " << plan_pb.ByteSizeLong();
LOG(INFO) << "deserilized plan : " << plan_pb.DebugString();
VLOG(1) << "deserilized plan : " << plan_pb.DebugString();
BuildingContext context;
QueryGenerator<uint8_t> query_generator(context, plan_pb);
auto res = query_generator.GenerateQuery();
Expand Down
4 changes: 4 additions & 0 deletions flex/codegen/src/building_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ struct TagIndMapping {
return tag_ind_2_tag_ids_;
}

const std::vector<int32_t>& GetTagId2TagInds() const {
return tag_id_2_tag_inds_;
}

// convert tag_ind (us) to tag ids
std::vector<int32_t> tag_ind_2_tag_ids_;
// convert tag ids(pb) to tag_inds
Expand Down
48 changes: 45 additions & 3 deletions flex/codegen/src/codegen_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,14 +203,56 @@ static codegen::ParamConst variable_to_param_const(const common::Variable& var,
param_const.var_name = var.property().key().name();
param_const.type =
common_data_type_pb_2_data_type(var.node_type().data_type());
} else {
param_const.var_name = ctx.GetNextVarName();
param_const.type = codegen::DataType::kVertexId;
} else if (var.has_tag()) {
// check is vertex or is edge from node_type
if (var.has_node_type()) {
auto node_type = var.node_type();
param_const.var_name = ctx.GetNextVarName();
if (node_type.type_case() == common::IrDataType::kDataType) {
param_const.type =
common_data_type_pb_2_data_type(node_type.data_type());
} else {
auto graph_type = node_type.graph_type();
if (graph_type.element_opt() ==
common::GraphDataType::GraphElementOpt::
GraphDataType_GraphElementOpt_VERTEX) {
param_const.type = codegen::DataType::kVertexId;
} else if (graph_type.element_opt() ==
common::GraphDataType::GraphElementOpt::
GraphDataType_GraphElementOpt_EDGE) {
param_const.type = codegen::DataType::kEdgeId;
} else {
LOG(FATAL) << "Unexpect graph type";
}
}
} else {
LOG(FATAL)
<< "Node type is not given when converting variable to param const";
}
}

return param_const;
}

static std::string interval_to_str(const common::Extract::Interval& interval) {
switch (interval) {
case common::Extract::Interval::Extract_Interval_YEAR:
return "Interval::YEAR";
case common::Extract::Interval::Extract_Interval_MONTH:
return "Interval::MONTH";
case common::Extract::Interval::Extract_Interval_DAY:
return "Interval::DAY";
case common::Extract::Interval::Extract_Interval_HOUR:
return "Interval::HOUR";
case common::Extract::Interval::Extract_Interval_MINUTE:
return "Interval::MINUTE";
case common::Extract::Interval::Extract_Interval_SECOND:
return "Interval::SECOND";
default:
LOG(FATAL) << "Unexpected interval" << interval;
}
}

} // namespace gs

#endif // CODEGEN_SRC_CODEGEN_UTILS_H_
Loading

0 comments on commit 6ef4128

Please sign in to comment.