diff --git a/docs-2.0/8.service-tuning/load-balance.md b/docs-2.0/8.service-tuning/load-balance.md index 8da38402081..0690a409feb 100644 --- a/docs-2.0/8.service-tuning/load-balance.md +++ b/docs-2.0/8.service-tuning/load-balance.md @@ -1,10 +1,6 @@ # Storage load balance -You can use the `BALANCE` statements to balance the distribution of partitions and Raft leaders, or remove redundant Storage servers. - -## Prerequisites - -The graph spaces stored in Nebula Graph must have more than one replicas for the system to balance the distribution of partitions and Raft leaders. +You can use the `BALANCE` statement to balance the distribution of partitions and Raft leaders, or remove redundant Storage servers. ## Balance partition distribution @@ -12,85 +8,87 @@ The graph spaces stored in Nebula Graph must have more than one replicas for the !!! danger - DON'T stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail. - -Take scaling out Nebula Graph for an example. - -After you add new storage hosts into the cluster, no partition is deployed on the new hosts. You can run [`SHOW HOSTS`](../3.ngql-guide/7.general-query-statements/6.show/6.show-hosts.md) to check the partition distribution. - -```ngql -nebual> SHOW HOSTS; -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| Host | Port | Status | Leader count | Leader distribution | Partition distribution | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:15" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:15" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:15" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -``` - -Run `BALANCE DATA` to start balancing the storage partitions. If the partitions are already balanced, `BALANCE DATA` fails. - -```ngql -nebula> BALANCE DATA; -+------------+ -| ID | -+------------+ -| 1614237867 | -+------------+ -``` - -A BALANCE task ID is returned after running `BALANCE DATA`. Run `BALANCE DATA ` to check the status of the `BALANCE` task. - -```ngql -nebula> BALANCE DATA 1614237867; -+--------------------------------------------------------------+-------------------+ -| balanceId, spaceId:partId, src->dst | status | -+--------------------------------------------------------------+-------------------+ -| "[1614237867, 11:1, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | -+--------------------------------------------------------------+-------------------+ -| "[1614237867, 11:1, storaged2:9779->storaged4:9779]" | "SUCCEEDED" | -+--------------------------------------------------------------+-------------------+ -| "[1614237867, 11:2, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | -+--------------------------------------------------------------+-------------------+ -... -+--------------------------------------------------------------+-------------------+ -| "Total:22, Succeeded:22, Failed:0, In Progress:0, Invalid:0" | 100 | -+--------------------------------------------------------------+-------------------+ -``` - -When all the subtasks succeed, the load balancing process finishes. Run `SHOW HOSTS` again to make sure the partition distribution is balanced. - -!!! note - - `BALANCE DATA` does not balance the leader distribution. - -```ngql -nebula> SHOW HOSTS; -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| Host | Port | Status | Leader count | Leader distribution | Partition distribution | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:9" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:9" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | -+-------------+------+----------+--------------+-----------------------------------+------------------------+ -``` + DO NOT stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail. + +### Examples + +After you add new storage hosts into the cluster, no partition is deployed on the new hosts. + +1. Run [`SHOW HOSTS`](../3.ngql-guide/7.general-query-statements/6.show/6.show-hosts.md) to check the partition distribution. + + ```ngql + nebual> SHOW HOSTS; + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | Host | Port | Status | Leader count | Leader distribution | Partition distribution | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:15" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:15" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:15" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + ``` + +2. Run `BALANCE DATA` to start balancing the storage partitions. If the partitions are already balanced, `BALANCE DATA` fails. + + ```ngql + nebula> BALANCE DATA; + +------------+ + | ID | + +------------+ + | 1614237867 | + +------------+ + ``` + +3. A BALANCE task ID is returned after running `BALANCE DATA`. Run `BALANCE DATA ` to check the status of the `BALANCE` task. + + ```ngql + nebula> BALANCE DATA 1614237867; + +--------------------------------------------------------------+-------------------+ + | balanceId, spaceId:partId, src->dst | status | + +--------------------------------------------------------------+-------------------+ + | "[1614237867, 11:1, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | + +--------------------------------------------------------------+-------------------+ + | "[1614237867, 11:1, storaged2:9779->storaged4:9779]" | "SUCCEEDED" | + +--------------------------------------------------------------+-------------------+ + | "[1614237867, 11:2, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | + +--------------------------------------------------------------+-------------------+ + ... + +--------------------------------------------------------------+-------------------+ + | "Total:22, Succeeded:22, Failed:0, In Progress:0, Invalid:0" | 100 | + +--------------------------------------------------------------+-------------------+ + ``` + +4. When all the subtasks succeed, the load balancing process finishes. Run `SHOW HOSTS` again to make sure the partition distribution is balanced. + + !!! Note + + `BALANCE DATA` does not balance the leader distribution. For more information, see [Balance leader distribution](#Balance leader distribution). + + ```ngql + nebula> SHOW HOSTS; + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | Host | Port | Status | Leader count | Leader distribution | Partition distribution | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:9" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:9" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | + +-------------+------+----------+--------------+-----------------------------------+------------------------+ + ``` If any subtask fails, run `BALANCE DATA` again to restart the balancing. If redoing load balancing does not solve the problem, ask for help in the [Nebula Graph community](https://discuss.nebula-graph.io/). @@ -100,17 +98,15 @@ To stop a balance task, run `BALANCE DATA STOP`. * If no balance task is running, an error is returned. -* If a balance task is running, the task ID is returned. +* If a balance task is running, the task ID (`balance_id`) is returned. -`BALANCE DATA STOP` does not stop the running subtasks but cancels all follow-up subtasks. The running subtasks continue. - -To check the status of the stopped balance task, run `BALANCE DATA `. +`BALANCE DATA STOP` does not stop the running subtasks but cancels all follow-up subtasks. To check the status of the stopped balance task, run `BALANCE DATA `. Once all the subtasks are finished or stopped, you can run `BALANCE DATA` again to balance the partitions again. -* If any subtask of the preceding balance task failed, Nebula Graph restarts the preceding balance task. +* If any subtask of the preceding balance task fails, Nebula Graph restarts the preceding balance task. -* If no subtask of the preceding balance task failed, Nebula Graph starts a new balance task. +* If no subtask of the preceding balance task fails, Nebula Graph starts a new balance task. ## RESET a balance task @@ -118,30 +114,34 @@ If a balance task fails to be restarted after being stopped, run `BALANCE DATA R ## Remove storage servers -To remove specific storage servers and scale in the Storage Service, use the `BALANCE DATA REMOVE ` syntax. +To remove specified storage servers and scale in the Storage Service, run `BALANCE DATA REMOVE `. + +### Example -For example, to remove the following storage servers: +To remove the following storage server, -|Server name|IP|Port| -|-|-|-| -|storage3|192.168.0.8|19779| -|storage4|192.168.0.9|19779| +|Server name|IP address|Port| +|:---|:---|:---| +|storage3|192.168.0.8|9779| +|storage4|192.168.0.9|9779| -Run the following statement: +Run the following command: ```ngql -BALANCE DATA REMOVE 192.168.0.8:19779,192.168.0.9:19779; +BALANCE DATA REMOVE 192.168.0.8:9779,192.168.0.9:9779; ``` Nebula Graph will start a balance task, migrate the storage partitions in storage3 and storage4, and then remove them from the cluster. -!!! note +!!! note - The removed server's state will change to `OFFLINE`. + The state of the removed server will change to `OFFLINE`. This record will be deleted after one day. To retain it, you can change the meta configuration `removed_threshold_sec`. ## Balance leader distribution -`BALANCE DATA` only balances the partition distribution. If the raft leader distribution is not balanced, some of the leaders may overload. To load balance the raft leaders, run `BALANCE LEADER`. +`BALANCE DATA` only balances the partition distribution. If the raft leader distribution is not balanced, some of the leaders may overload. To balance the raft leaders, run `BALANCE LEADER`. + +### Example ```ngql nebula> BALANCE LEADER; @@ -167,3 +167,7 @@ nebula> SHOW HOSTS; | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | +-------------+------+----------+--------------+-----------------------------------+------------------------+ ``` + +!!! caution + + In Nebula Graph {{ nebula.release }}, switching leaders will cause a large number of short-term request errors (Storage Error `E_RPC_FAILURE`). For solutions, see [FAQ](../20.appendix/0.FAQ.md).