Skip to content

Commit

Permalink
parent 6ab796e
Browse files Browse the repository at this point in the history
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format

fix script

debug

fix test script

minor

sort query results

minor

minor

format

fix ci

format

gstest

Add License
  • Loading branch information
shirly121 authored and zhanglei1949 committed Sep 25, 2023
1 parent 7314f8c commit a40efd3
Show file tree
Hide file tree
Showing 21 changed files with 563 additions and 36 deletions.
40 changes: 36 additions & 4 deletions .github/workflows/hqps-db-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ jobs:
which cargo
# build compiler
cd ${GIE_HOME}/compiler
make build
cd ${GIE_HOME}/
mvn clean install -Pexperimental -DskipTests
- name: Prepare dataset and workspace
env:
Expand All @@ -91,6 +91,8 @@ jobs:
mkdir -p ${INTERACTIVE_WORKSPACE}/data/ldbc
GRAPH_SCHEMA_YAML=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_graph_schema.yaml
cp ${GRAPH_SCHEMA_YAML} ${INTERACTIVE_WORKSPACE}/data/ldbc/graph.yaml
mkdir -p ${INTERACTIVE_WORKSPACE}/data/movies
cp ${GS_TEST_DIR}/flex/movies/movies_schema.yaml ${INTERACTIVE_WORKSPACE}/data/movies/graph.yaml
- name: Sample Query test
env:
Expand Down Expand Up @@ -129,7 +131,19 @@ jobs:
eval ${cmd}
done
- name: Run End-to-End cypher adhoc query test
# test movie graph, 8,9,10 are not supported now
# change the default_graph config in ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml to movies
sed -i 's/default_graph: ldbc/default_graph: movies/g' ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
for i in 1 2 3 4 5 6 7 11 12 13 14 15;
do
cmd="./load_plan_and_gen.sh -e=hqps -i=../tests/hqps/queries/movie/query${i}.cypher -w=/tmp/codgen/"
cmd=${cmd}" -o=/tmp/plugin --ir_conf=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml "
cmd=${cmd}" --graph_schema_path=${INTERACTIVE_WORKSPACE}/data/movies/graph.yaml"
echo $cmd
eval ${cmd}
done
- name: Run End-to-End cypher adhoc ldbc query test
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest
HOME : /home/graphscope/
Expand All @@ -138,5 +152,23 @@ jobs:
cd ${GITHUB_WORKSPACE}/flex/tests/hqps/
export FLEX_DATA_DIR=${GS_TEST_DIR}/flex/ldbc-sf01-long-date
export ENGINE_TYPE=hiactor
bash hqps_cypher_test.sh ${GS_TEST_DIR} ${INTERACTIVE_WORKSPACE}
# change the default_graph config in ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml to ldbc
sed -i 's/default_graph: movies/default_graph: ldbc/g' ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
bash hqps_cypher_test.sh ${INTERACTIVE_WORKSPACE} ldbc ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_bulk_load.yaml \
${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
- name: Run End-to-End cypher adhoc movie query test
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest
HOME : /home/graphscope/
INTERACTIVE_WORKSPACE: /tmp/interactive_workspace
run: |
cd ${GITHUB_WORKSPACE}/flex/tests/hqps/
export FLEX_DATA_DIR=../../interactive/examples/movies/
export ENGINE_TYPE=hiactor
# change the default_graph config in ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml to movies
sed -i 's/default_graph: ldbc/default_graph: movies/g' ${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
bash hqps_cypher_test.sh ${INTERACTIVE_WORKSPACE} movies ${GS_TEST_DIR}/flex/movies/movies_import.yaml \
${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
Original file line number Diff line number Diff line change
@@ -1 +1 @@
MATCH(p : person {id: $personId}) RETURN p.name;
MATCH(p : person {id: $personId}) RETURN p.name;
79 changes: 48 additions & 31 deletions flex/tests/hqps/hqps_cypher_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,47 +19,50 @@ SERVER_BIN=${FLEX_HOME}/build/bin/sync_server
GIE_HOME=${FLEX_HOME}/../interactive_engine/

#
if [ $# -lt 2 ]; then
echo "only receives: $# args, need 2"
echo "Usage: $0 <GS_TEST_DIR> <INTERACTIVE_WORKSPACE>"
if [ ! $# -eq 4 ]; then
echo "only receives: $# args, need 4"
echo "Usage: $0 <INTERACTIVE_WORKSPACE> <GRAPH_NAME> <BULK_LOAD_FILE> <ENGINE_CONFIG>"
exit 1
fi

GS_TEST_DIR=$1
INTERACTIVE_WORKSPACE=$2
if [ ! -d ${GS_TEST_DIR} ]; then
echo "GS_TEST_DIR: ${GS_TEST_DIR} not exists"
exit 1
fi
INTERACTIVE_WORKSPACE=$1
GRAPH_NAME=$2
GRAPH_BULK_LOAD_YAML=$3
ENGINE_CONFIG_PATH=$4
if [ ! -d ${INTERACTIVE_WORKSPACE} ]; then
echo "INTERACTIVE_WORKSPACE: ${INTERACTIVE_WORKSPACE} not exists"
exit 1
fi

ENGINE_CONFIG_PATH=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
ORI_GRAPH_SCHEMA_YAML=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_graph_schema.yaml
GRAPH_SCHEMA_YAML=${INTERACTIVE_WORKSPACE}/data/ldbc/graph.yaml
GRAPH_BULK_LOAD_YAML=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_bulk_load.yaml
COMPILER_GRAPH_SCHEMA=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_graph_schema.yaml
GRAPH_CSR_DATA_DIR=${HOME}/csr-data-dir/
# check if GRAPH_SCHEMA_YAML exists
if [ ! -f ${GRAPH_SCHEMA_YAML} ]; then
echo "GRAPH_SCHEMA_YAML: ${GRAPH_SCHEMA_YAML} not found"
# check graph is ldbc or movies
if [ ${GRAPH_NAME} != "ldbc" ] && [ ${GRAPH_NAME} != "movies" ]; then
echo "GRAPH_NAME: ${GRAPH_NAME} not supported, use movies or ldbc"
exit 1
fi
if [ ! -d ${INTERACTIVE_WORKSPACE}/data/${GRAPH_NAME} ]; then
echo "GRAPH: ${GRAPH_NAME} not exists"
exit 1
fi
if [ ! -f ${INTERACTIVE_WORKSPACE}/data/${GRAPH_NAME}/graph.yaml ]; then
echo "GRAPH_SCHEMA_FILE: ${BULK_LOAD_FILE} not exists"
exit 1
fi

# check if GRAPH_BULK_LOAD_YAML exists
if [ ! -f ${GRAPH_BULK_LOAD_YAML} ]; then
echo "GRAPH_BULK_LOAD_YAML: ${GRAPH_BULK_LOAD_YAML} not found"
echo "GRAPH_BULK_LOAD_YAML: ${GRAPH_BULK_LOAD_YAML} not exists"
exit 1
fi

# check if COMPILER_GRAPH_SCHEMA exists
if [ ! -f ${COMPILER_GRAPH_SCHEMA} ]; then
echo "COMPILER_GRAPH_SCHEMA: ${COMPILER_GRAPH_SCHEMA} not found"
if [ ! -f ${ENGINE_CONFIG_PATH} ]; then
echo "ENGINE_CONFIG: ${ENGINE_CONFIG_PATH} not exists"
exit 1
fi

GRAPH_SCHEMA_YAML=${INTERACTIVE_WORKSPACE}/data/${GRAPH_NAME}/graph.yaml
GRAPH_CSR_DATA_DIR=${HOME}/csr-data-dir/
# rm data dir if exists
if [ -d ${GRAPH_CSR_DATA_DIR} ]; then
rm -rf ${GRAPH_CSR_DATA_DIR}
fi


RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color
Expand Down Expand Up @@ -92,8 +95,6 @@ start_engine_service(){
err "SERVER_BIN not found"
exit 1
fi
# export FLEX_DATA_DIR
export FLEX_DATA_DIR=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/

cmd="${SERVER_BIN} -c ${ENGINE_CONFIG_PATH} -g ${GRAPH_SCHEMA_YAML} "
cmd="${cmd} --data-path ${GRAPH_CSR_DATA_DIR} -l ${GRAPH_BULK_LOAD_YAML} "
Expand All @@ -111,7 +112,7 @@ start_engine_service(){
start_compiler_service(){
echo "try to start compiler service"
pushd ${GIE_HOME}/compiler
cmd="make run graph.schema=${COMPILER_GRAPH_SCHEMA} config.path=${ENGINE_CONFIG_PATH}"
cmd="make run graph.schema=${GRAPH_SCHEMA_YAML} config.path=${ENGINE_CONFIG_PATH}"
echo "Start compiler service with command: ${cmd}"
${cmd} &
sleep 5
Expand Down Expand Up @@ -141,11 +142,27 @@ run_simple_test(){
popd
}

run_movie_test(){
echo "run movie test"
pushd ${GIE_HOME}/compiler
cmd="mvn test -Dtest=com.alibaba.graphscope.cypher.integration.movie.MovieTest"
echo "Start movie test: ${cmd}"
${cmd}
info "Finish movie test"
popd
}

kill_service
start_engine_service
start_compiler_service
run_ldbc_test
run_simple_test
# if GRAPH_NAME equals ldbc
if [ "${GRAPH_NAME}" == "ldbc" ]; then
run_ldbc_test
run_simple_test
else
run_movie_test
fi

kill_service


Expand Down
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query1.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (tom:Person) WHERE tom.name = "Tom Hanks" RETURN tom.born AS bornYear,tom.name AS personName;
4 changes: 4 additions & 0 deletions flex/tests/hqps/queries/movie/query10.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MATCH p=shortestPath(
(bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"})
)
RETURN p;
4 changes: 4 additions & 0 deletions flex/tests/hqps/queries/movie/query11.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(movie:Movie)
WITH movie.title as movieTitle, movie.released as movieReleased
ORDER BY movieReleased DESC, movieTitle ASC LIMIT 10
return movieTitle, movieReleased;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query12.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person)
WITH DISTINCT coActor.name AS coActorName ORDER BY coActorName ASC LIMIT 10 return coActorName;
4 changes: 4 additions & 0 deletions flex/tests/hqps/queries/movie/query13.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name AS coCoActorName ORDER BY coCoActorName ASC LIMIT 10;
6 changes: 6 additions & 0 deletions flex/tests/hqps/queries/movie/query14.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name AS coCoActorName, count(coCoActor) AS frequency
ORDER BY frequency DESC, coCoActorName ASC
LIMIT 5;
4 changes: 4 additions & 0 deletions flex/tests/hqps/queries/movie/query15.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(cruise:Person {name: 'Tom Cruise'})
WHERE NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(cruise)
RETURN tom.name AS actorName, movie1.title AS movie1Title, coActor.name AS coActorName, movie2.title AS movie2Title, cruise.name AS coCoActorName
ORDER BY movie1Title ASC, movie2Title ASC LIMIT 10;
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query2.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (cloudAtlas:Movie {title: "Cloud Atlas"}) RETURN cloudAtlas.tagline AS tagline, cloudAtlas.released AS releasedYear,cloudAtlas.title AS title;
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query3.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (people:Person) RETURN people.name AS personName ORDER BY personName ASC LIMIT 10;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query4.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000
RETURN nineties.title AS ninetiesTitle ORDER BY ninetiesTitle DESC LIMIT 10;
6 changes: 6 additions & 0 deletions flex/tests/hqps/queries/movie/query5.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies)
RETURN tom.born AS bornYear,
tomHanksMovies.tagline AS movieTagline,
tomHanksMovies.title AS movieTitle,
tomHanksMovies.released AS releaseYear
ORDER BY releaseYear DESC, movieTitle ASC LIMIT 10;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query6.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (cloudAtlas:Movie {title: "Cloud Atlas"})<-[:DIRECTED]-(directors)
RETURN directors.name AS directorsName ORDER BY directorsName ASC LIMIT 10;
3 changes: 3 additions & 0 deletions flex/tests/hqps/queries/movie/query7.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN m.title AS movieTitle, m.released AS releasedYear, coActors.name AS coActorName
ORDER BY releasedYear DESC, movieTitle ASC LIMIT 10;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query8.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (people:Person)-[relatedTo]-(:Movie {title: "Cloud Atlas"})
RETURN people.name, type(relatedTo), relatedTo
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query9.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..3]-(hollywood)
RETURN DISTINCT bacon, hollywood
1 change: 1 addition & 0 deletions interactive_engine/compiler/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,7 @@
<exclude>**/IrLdbcTest.java</exclude>
<exclude>**/SimpleMatchTest.java</exclude>
<exclude>**/IrPatternTest.java</exclude>
<exclude>**/MovieTest.java</exclude>
</excludes>
</configuration>
</plugin>
Expand Down
Loading

0 comments on commit a40efd3

Please sign in to comment.