From 1b0b304fa4cec636fc94c9d7db532b3da15588d7 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Wed, 29 Jul 2020 19:15:11 +0800 Subject: [PATCH 1/8] vid-par --- docs/manual-EN/1.overview/1.concepts/1.data-model.md | 2 +- .../insert-vertex-syntax.md | 4 ++-- docs/manual-EN/5.appendix/vid-partition.md | 11 +++++++++++ docs/manual-EN/README.md | 2 +- mkdocs.yml | 5 ++--- 5 files changed, 17 insertions(+), 7 deletions(-) create mode 100644 docs/manual-EN/5.appendix/vid-partition.md diff --git a/docs/manual-EN/1.overview/1.concepts/1.data-model.md b/docs/manual-EN/1.overview/1.concepts/1.data-model.md index 3fa9c995a1d..ca2e3877ca4 100644 --- a/docs/manual-EN/1.overview/1.concepts/1.data-model.md +++ b/docs/manual-EN/1.overview/1.concepts/1.data-model.md @@ -23,7 +23,7 @@ To better understand the elements of a graph data model, let us walk through eac ## Vertices -Vertices are typically used to represent entities in the real world. In the preceding example, the graph contains eleven vertices. +Vertices are typically used to represent entities in the real world. In **Nebula Graph**, vertices are identified with vertex identifiers (i.e. VIDs). The `VID` must be unique in the graph space. In the preceding example, the graph contains eleven vertices. diff --git a/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/insert-vertex-syntax.md b/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/insert-vertex-syntax.md index 4fa20bf169b..10dc4a9eb70 100644 --- a/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/insert-vertex-syntax.md +++ b/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/insert-vertex-syntax.md @@ -2,7 +2,7 @@ ```ngql INSERT VERTEX [, , ...] (prop_name_list[, prop_name_list]) - {VALUES | VALUE} vid: (prop_value_list[, prop_value_list]) + {VALUES | VALUE} VID: (prop_value_list[, prop_value_list]) prop_name_list: [prop_name [, prop_name] ...] @@ -15,7 +15,7 @@ The `INSERT VERTEX` statement inserts a vertex or vertices into **Nebula Graph** * `tag_name` denotes the `tag` (vertex type), which must be created before `INSERT VERTEX`. * `prop_name_list` is the property name list in the given `tag_name`. -* `vid` is the vertex ID. The current sorting basis is "binary coding order", i.e. 0, 1, 2, ... 9223372036854775807, -9223372036854775808, -9223372036854775807, ..., -1. `vid` supports specifying ID manually, or call hash() function to generate. +* `VID` is the vertex ID. The `VID` must be unique in the graph space. The current sorting basis is "binary coding order", i.e. 0, 1, 2, ... 9223372036854775807, -9223372036854775808, -9223372036854775807, ..., -1. `VID` supports specifying ID manually, or call hash() function to generate. * `prop_value_list` must provide the value list according to the `prop_name_list`. If no value matches the type, an error will be returned. diff --git a/docs/manual-EN/5.appendix/vid-partition.md b/docs/manual-EN/5.appendix/vid-partition.md new file mode 100644 index 00000000000..bc07a0930fb --- /dev/null +++ b/docs/manual-EN/5.appendix/vid-partition.md @@ -0,0 +1,11 @@ +# Vertex Identifier and Partition + +This document provides some introductions on vertex identifier (`VID` for short) and partition. + +In **Nebula Graph**, vertices are identified with vertex identifiers (i.e. VIDs). When inserting a vertex, you can either assign an id manually or use the hash function to generate an id for the vertex. The `VID` must be unique in the graph space. + +When querying in a **Nebula Graph** cluster, data has to be exchanged between different cluster nodes if the data is sharded into different partitions and therefore residing on multiple nodes. In particular graph traversals are usually executed on a Coordinator, because they need global information. This results in a lot of network traffic and potentially slow query execution. + +To achieve single-server alike query execution times for graph queries in a cluster, you need to shard vertices based on their tags so that vertices with the same tags are stored on the same partition. This can improve data locality and reduce the number of network hops between cluster nodes. + +If you want all the vertices with the same tag to store on the same partition, you need to make sure that all the vertex VIDs have the same modulus. And all edges connecting these vertices are stored on this partition as well. diff --git a/docs/manual-EN/README.md b/docs/manual-EN/README.md index e848a9d0eee..aca2fe587eb 100644 --- a/docs/manual-EN/README.md +++ b/docs/manual-EN/README.md @@ -167,7 +167,7 @@ It is the optimal solution in the world capable of hosting graphs with dozens of * [Gremlin V.S. nGQL](5.appendix/gremlin-ngql.md) * [Cypher V.S. nGQL](5.appendix/cypher-ngql.md) * [SQL V.S. nGQL](5.appendix/sql-ngql.md) - +* [Vertex Identifier and Partition](5.appendix/vid-partition.md) ## Misc diff --git a/mkdocs.yml b/mkdocs.yml index 02e1d904410..12802a40da8 100755 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -208,9 +208,8 @@ nav: - Cypher & nGQL: manual-EN/5.appendix/cypher-ngql.md - Gremlin & nGQL: manual-EN/5.appendix/gremlin-ngql.md - SQL & nGQL: manual-EN/5.appendix/sql-ngql.md - # - Upgrading Nebula Graph: manual-EN/5.appendix/upgrade-guide.md - # - Download PDF: - # - https://oss-cdn.nebula-graph.io/doc/v1.0.0-en.pdf + - Vertex Identifier and Partition: manual-EN/5.appendix/vid-partition.md + - δΈ­ζ–‡ζ‰‹ε†Œ: - https://docs.nebula-graph.com.cn/ From e47109e8e4d2df13bed73fcc3ebb1c63cae80631 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Fri, 31 Jul 2020 16:41:30 +0800 Subject: [PATCH 2/8] update --- docs/manual-EN/5.appendix/vid-partition.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/docs/manual-EN/5.appendix/vid-partition.md b/docs/manual-EN/5.appendix/vid-partition.md index bc07a0930fb..57fbe038620 100644 --- a/docs/manual-EN/5.appendix/vid-partition.md +++ b/docs/manual-EN/5.appendix/vid-partition.md @@ -2,10 +2,22 @@ This document provides some introductions on vertex identifier (`VID` for short) and partition. -In **Nebula Graph**, vertices are identified with vertex identifiers (i.e. VIDs). When inserting a vertex, you can either assign an id manually or use the hash function to generate an id for the vertex. The `VID` must be unique in the graph space. +In **Nebula Graph**, vertices are identified with vertex identifiers (i.e. `VID`s). When inserting a vertex, you must specify a `VID` for it. You can generate `VID`s either with your own application or with the hash function provided by **Nebula Graph**. -When querying in a **Nebula Graph** cluster, data has to be exchanged between different cluster nodes if the data is sharded into different partitions and therefore residing on multiple nodes. In particular graph traversals are usually executed on a Coordinator, because they need global information. This results in a lot of network traffic and potentially slow query execution. +`VID`s must be unique in a graph space. That is, in the same graph space, vertices with the same `VID` are considered as the same vertex. `VID`s in different graph spaces are independent of each other. In addition, one `VID` can have multiple `TAG`s. -To achieve single-server alike query execution times for graph queries in a cluster, you need to shard vertices based on their tags so that vertices with the same tags are stored on the same partition. This can improve data locality and reduce the number of network hops between cluster nodes. +The relation between `VID` and partition is: -If you want all the vertices with the same tag to store on the same partition, you need to make sure that all the vertex VIDs have the same modulus. And all edges connecting these vertices are stored on this partition as well. +```text +VID mod partition_number = partition ID +``` + +In the preceding formula, + +- `mod` is the modulo operation. +- `partition_number` is the number of partition for the graph space where the `VID` is located, namely the value of `partition_num` in the [CREATE SPACE](../2.query-language/4.statement-syntax/1.data-definition-statements/create-space-syntax.md) statement. +- `partition ID` the ID for the partition where the `VID` is located. + +Therefore, if you want some certain vertices to locate on the same partition (i.e. on the same machine), you can control the generation of the `VID`s by using the preceding formula. + +In addition, the correspondence between the `partition ID` and the machines are random. Therefore, you can't assume that any two partitions are located on the same machine. From dd539266e9d2b7c98f056e2cc8603f6cc54a40d4 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Wed, 5 Aug 2020 11:12:39 +0800 Subject: [PATCH 3/8] update --- .../2.data-query-and-manipulation-statements/fetch-syntax.md | 2 +- .../1.build/1.build-source-code.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/fetch-syntax.md b/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/fetch-syntax.md index 968216a9127..52826bbec5c 100644 --- a/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/fetch-syntax.md +++ b/docs/manual-EN/2.query-language/4.statement-syntax/2.data-query-and-manipulation-statements/fetch-syntax.md @@ -13,7 +13,7 @@ FETCH PROP ON * `*` indicates returning all the properties of the given vertex. -`::=[tag_name [, tag_name]]` is the tag name. It must be the same tag within return_list. +`::=[tag_name [, tag_name]]` is the tag name. It must be the same tag within return_list. `::=[vertex_id [, vertex_id]]` is a list of vertex IDs separated by comma (,). diff --git a/docs/manual-EN/3.build-develop-and-administration/1.build/1.build-source-code.md b/docs/manual-EN/3.build-develop-and-administration/1.build/1.build-source-code.md index 28d5b80a30c..a435365046c 100644 --- a/docs/manual-EN/3.build-develop-and-administration/1.build/1.build-source-code.md +++ b/docs/manual-EN/3.build-develop-and-administration/1.build/1.build-source-code.md @@ -106,12 +106,12 @@ $ sudo make install $ cd /usr/local/nebula $ sudo cp etc/nebula-storaged.conf.production etc/nebula-storaged.conf $ sudo cp etc/nebula-metad.conf.production etc/nebula-metad.conf -$ sudo cp etc/nebula-metad.conf.production etc/nebula-metad.conf +$ sudo cp etc/nebula-graphd.conf.production etc/nebula-graphd.conf # For trial $ cd /usr/local/nebula $ sudo cp etc/nebula-storaged.conf.default etc/nebula-storaged.conf $ sudo cp etc/nebula-metad.conf.default etc/nebula-metad.conf -$ sudo cp etc/nebula-metad.conf.default etc/nebula-metad.conf +$ sudo cp etc/nebula-graphd.conf.default etc/nebula-graphd.conf ``` See the [Start and Stop Nebula Graph Services Doc](../2.install/2.start-stop-service.md) for details. From 7e3de114f1b9410b877a4c2c1959d28225f26179 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Thu, 6 Aug 2020 13:34:11 +0800 Subject: [PATCH 4/8] minors --- .../2.query-language/1.data-types/type-conversion.md | 12 ++++-------- .../3.configurations/0.system-requirement.md | 2 +- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/docs/manual-EN/2.query-language/1.data-types/type-conversion.md b/docs/manual-EN/2.query-language/1.data-types/type-conversion.md index 1c4ddb2117b..436199b1142 100644 --- a/docs/manual-EN/2.query-language/1.data-types/type-conversion.md +++ b/docs/manual-EN/2.query-language/1.data-types/type-conversion.md @@ -8,9 +8,9 @@ Implicit conversions are automatically performed when a value is copied to a com 1. Following types can implicitly converted to `bool`: -- The conversions from/to bool consider `false` equivalent to `0` for empty string types, true is equivalent to all other values. -- The conversions from/to bool consider `false` equivalent to `0` for int types, true is equivalent to all other values. -- The conversions from/to bool consider `false` equivalent to `0.0` for float types, true is equivalent to all other values. + - The conversions from/to bool consider `false` equivalent to `0` for empty string types, true is equivalent to all other values. + - The conversions from/to bool consider `false` equivalent to `0` for int types, true is equivalent to all other values. + - The conversions from/to bool consider `false` equivalent to `0.0` for float types, true is equivalent to all other values. 2. `int` can implicitly converted to `double`. @@ -20,8 +20,4 @@ In addition to implicit type conversion, explicit type conversion is also suppor `(type_name)expression`. -For example, the results of -`YIELD length((string)(123)), (int)"123" + 1` - -are `3, 124` respectively. -And `YIELD (int)("12ab3")` fails in conversion. +For example, the results of `YIELD length((string)(123)), (int)"123" + 1` are `3, 124` respectively. The results of `YIELD (int)(TRUE)` is `1`. And `YIELD (int)("12ab3")` fails in conversion. diff --git a/docs/manual-EN/3.build-develop-and-administration/3.configurations/0.system-requirement.md b/docs/manual-EN/3.build-develop-and-administration/3.configurations/0.system-requirement.md index 64ad6a059a6..ca76970d08a 100644 --- a/docs/manual-EN/3.build-develop-and-administration/3.configurations/0.system-requirement.md +++ b/docs/manual-EN/3.build-develop-and-administration/3.configurations/0.system-requirement.md @@ -54,7 +54,7 @@ Take AWS EC2 c5d.xlarge as an example: ## Resource Estimation (Three Replicas) * Storage space (full cluster): number of edges and vertices * average bytes of attributes * 6 -* Memory (full cluster): number of edges and vertices * 5 bytes + number of RocksDB instances * (write_buffer_size * max_write_buffer_number + rocksdb_block_cache), where each directory in the `--data_path` item in the `etc/nebula-storaged.conf` file corresponds to a RocksDB instance +* Memory (full cluster): number of edges and vertices * 15 bytes + number of RocksDB instances * (write_buffer_size * max_write_buffer_number + rocksdb_block_cache), where each directory in the `--data_path` item in the `etc/nebula-storaged.conf` file corresponds to a RocksDB instance * Partitions number of a graph space: number of disks in the cluster * (2 to 10), the better performance of the hard disk, the larger the value. * Reserve 20% space for memory and hard disk buffer. From 63275520367be4748d7c1792562c324b49cafdf9 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Mon, 10 Aug 2020 10:57:45 +0800 Subject: [PATCH 5/8] fix-comments --- .../manual-EN/1.overview/1.concepts/1.data-model.md | 13 ++++++++++--- docs/manual-EN/5.appendix/vid-partition.md | 2 +- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/manual-EN/1.overview/1.concepts/1.data-model.md b/docs/manual-EN/1.overview/1.concepts/1.data-model.md index ca2e3877ca4..f5c39795383 100644 --- a/docs/manual-EN/1.overview/1.concepts/1.data-model.md +++ b/docs/manual-EN/1.overview/1.concepts/1.data-model.md @@ -29,7 +29,7 @@ Vertices are typically used to represent entities in the real world. In **Nebula ## Tags -In **Nebula Graph**, vertex properties are clustered by **tags**. In the example above, the vertices have tags **player** and **team**. +In **Nebula Graph**, vertex properties are clustered by **tags**. One vertex can have one to multiple tags. In the preceding example, the vertices have tags **player** and **team**. @@ -43,13 +43,20 @@ Edges are used to connect vertices. Each edge usually represents a relationship `Each edge` is an instance of an edge type. Our example uses _**serve**_ and _**like**_ as edge types. Take edge _**serve**_ for example, in the preceding picture, vertex `101` (represents a **player**) is the source vertex and vertex `215` (represents a **team**) is the target vertex. We see that vertex `101` has an outgoing edge while vertex `215` has an incoming edge. -## Properties +## Properties of Vertices and Edges -Properties are named-value pairs within vertices and edges. In our example graph, we have used the properties `id`, `name` and `age` on **player**, `id` and `name` on **team**, and `likeness` on _**like**_ edge. +Both vertices and edges can have properties. Properties are described with key value pairs. In our example graph, we have used the properties `id`, `name` and `age` on **player**, `id` and `name` on **team**, and `likeness` on _**like**_ edge. ## Edge Rank Edge rank is an immutable user-assigned 64-bit signed integer. It affects the edge order of the same edge type between two vertices. The edge with a higher rank value comes first. When not specified, the default rank value is zero. The current sorting basis is "binary coding order", i.e. 0, 1, 2, ... 9223372036854775807, -9223372036854775808, -9223372036854775807, ..., -1. +In addition to an edge type, the edge between two vertices must have an edge rank. The edge rank is a 64-bit integer assigned by the user; if not specified, the edge rank defaults to 0. + +An edge can be represented uniquely with the [source vertex, edge type, edge rank and the dest vertex]. + +The edge rank affects the edge order of the same edge type between two vertices. The edge with a higher rank value comes first. + +The current sorting basis is "binary coding order", i.e. 0, 1, 2, ... 9223372036854775807, -9223372036854775808, -9223372036854775807, ..., -1. ## Schema diff --git a/docs/manual-EN/5.appendix/vid-partition.md b/docs/manual-EN/5.appendix/vid-partition.md index 57fbe038620..e40eb2cdd97 100644 --- a/docs/manual-EN/5.appendix/vid-partition.md +++ b/docs/manual-EN/5.appendix/vid-partition.md @@ -9,7 +9,7 @@ In **Nebula Graph**, vertices are identified with vertex identifiers (i.e. `VID` The relation between `VID` and partition is: ```text -VID mod partition_number = partition ID +VID mod partition_number = partition ID + 1 ``` In the preceding formula, From a88b8a95c3d1b028da78d93567658066084daab3 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Mon, 10 Aug 2020 11:29:30 +0800 Subject: [PATCH 6/8] add example --- docs/manual-EN/5.appendix/vid-partition.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/manual-EN/5.appendix/vid-partition.md b/docs/manual-EN/5.appendix/vid-partition.md index e40eb2cdd97..85128139977 100644 --- a/docs/manual-EN/5.appendix/vid-partition.md +++ b/docs/manual-EN/5.appendix/vid-partition.md @@ -6,6 +6,8 @@ In **Nebula Graph**, vertices are identified with vertex identifiers (i.e. `VID` `VID`s must be unique in a graph space. That is, in the same graph space, vertices with the same `VID` are considered as the same vertex. `VID`s in different graph spaces are independent of each other. In addition, one `VID` can have multiple `TAG`s. +When inserting data into **Nebula Graph**, vertices and edges will distribute to different partitions. And the partitions are located on different machines. If you want some certain vertices to locate on the same partition (i.e. on the same machine), you can control the generation of the `VID`s by using the following formula. + The relation between `VID` and partition is: ```text @@ -18,6 +20,6 @@ In the preceding formula, - `partition_number` is the number of partition for the graph space where the `VID` is located, namely the value of `partition_num` in the [CREATE SPACE](../2.query-language/4.statement-syntax/1.data-definition-statements/create-space-syntax.md) statement. - `partition ID` the ID for the partition where the `VID` is located. -Therefore, if you want some certain vertices to locate on the same partition (i.e. on the same machine), you can control the generation of the `VID`s by using the preceding formula. +For example, if there are 100 partitions, the vertices with `VID` 1, 11, 101, 1001 will be stored on the same partition. In addition, the correspondence between the `partition ID` and the machines are random. Therefore, you can't assume that any two partitions are located on the same machine. From 28303c08398b83a22f8088b097a33ae8e982344e Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Mon, 10 Aug 2020 11:42:51 +0800 Subject: [PATCH 7/8] update compact --- .../5.storage-service-administration/compact.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/manual-EN/3.build-develop-and-administration/5.storage-service-administration/compact.md b/docs/manual-EN/3.build-develop-and-administration/5.storage-service-administration/compact.md index 995f99645f5..6b541470a6f 100644 --- a/docs/manual-EN/3.build-develop-and-administration/5.storage-service-administration/compact.md +++ b/docs/manual-EN/3.build-develop-and-administration/5.storage-service-administration/compact.md @@ -12,7 +12,7 @@ By default, the `disable_auto_compactions` parameter is set to `false`. Before d - The customized compact style for **Nebula Graph**. You can run the `SUBMIT JOB COMPACT` command to start it. You can use it to perform large scale background operations such as sst files merging in large scale or TTL. This kind of compact is usually performed after midnight. -In addition, you can modify the number of threads in both methods by the following command. You can decrease the threads during daytime and increase it at night. +In addition, you can modify the number of threads in both methods by the following command. ```ngql nebula> UPDATE CONFIGS storage:rocksdb_db_options = \ From bcd43c854e827bbf938e69f9229977e3929c27c9 Mon Sep 17 00:00:00 2001 From: Amber1990Zhang <1345783682@qq.com> Date: Tue, 11 Aug 2020 10:02:14 +0800 Subject: [PATCH 8/8] update3 --- docs/manual-EN/1.overview/1.concepts/1.data-model.md | 2 +- docs/manual-EN/5.appendix/vid-partition.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/manual-EN/1.overview/1.concepts/1.data-model.md b/docs/manual-EN/1.overview/1.concepts/1.data-model.md index f5c39795383..7f30edb05dd 100644 --- a/docs/manual-EN/1.overview/1.concepts/1.data-model.md +++ b/docs/manual-EN/1.overview/1.concepts/1.data-model.md @@ -29,7 +29,7 @@ Vertices are typically used to represent entities in the real world. In **Nebula ## Tags -In **Nebula Graph**, vertex properties are clustered by **tags**. One vertex can have one to multiple tags. In the preceding example, the vertices have tags **player** and **team**. +In **Nebula Graph**, vertex properties are clustered by **tags**. One vertex can have one or more tags. In the preceding example, the vertices have tags **player** and **team**. diff --git a/docs/manual-EN/5.appendix/vid-partition.md b/docs/manual-EN/5.appendix/vid-partition.md index 85128139977..1b0f620945a 100644 --- a/docs/manual-EN/5.appendix/vid-partition.md +++ b/docs/manual-EN/5.appendix/vid-partition.md @@ -20,6 +20,6 @@ In the preceding formula, - `partition_number` is the number of partition for the graph space where the `VID` is located, namely the value of `partition_num` in the [CREATE SPACE](../2.query-language/4.statement-syntax/1.data-definition-statements/create-space-syntax.md) statement. - `partition ID` the ID for the partition where the `VID` is located. -For example, if there are 100 partitions, the vertices with `VID` 1, 11, 101, 1001 will be stored on the same partition. +For example, if there are 100 partitions, the vertices with `VID` 1, 101, 1001 will be stored on the same partition. In addition, the correspondence between the `partition ID` and the machines are random. Therefore, you can't assume that any two partitions are located on the same machine.