Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rebalancer: parallel routes applying
Closes #161 @TarantoolBot document Title: vshard rebalancer_max_sending VShard from its first release had quite a simple rebalancer - one process on one node calculates who and to whom should send how many buckets. The nodes applied these so called routes one by one sequentially. Unfortunately, such a simple schema works not fast enough, especially for Vinyl, where costs of reading disk are comparable with network costs. In fact, with Vinyl the rebalancer routes applier was sleeping most of the time. Now each node can send multiple buckets in parallel in a round-robin manner to several destinations, or even to one. To set degree of parallelism a new option is added - rebalancer_max_sending. It can be specified in a storage configuration in the root table: cfg.rebalancer_max_sending = 5 vshard.storage.cfg(cfg, box.info.uuid) In routers the option is ignored. Note, that max_sending N perhaps won't give N times speed up. It depends on network, disk, number of other fibers in the system. By default the option is 1. Maximal value is 15. One another important thing - from this moment rebalancer_max_receiving is not useless. It can actually limit load at one storage. Consider an example. You have 10 replicasets and a new one is added. Now all the 10 replicasets will try to send buckets to the new one. Assume, that each replicaset has 5 max sending. In that case the new replicaset will experience quite a high load of 50 buckets being downloaded at once. If the node needs to do some other work, perhaps such a big load is undesirable. Also too many parallel buckets can lead to timeouts in the rebalancing process itself. Then you can set lower rebalancer_max_sending on old replicasets, or decrease rebalancer_max_receiving on the new one. In the latter case some workers on old nodes will be throttled, and you will see that in the logs. Rebalancer_max_sending is important, if you have restriction on how many buckets can be read-only at once in the cluster. As you remember, when a bucket is being sent, it does not accept new write requests. For example, you have 100000 buckets and each bucket stores ~0.001% of your data. The cluster has 10 replicasets. And you never can afford > 0.1% of data locked on write. Then you should not set rebalancer_max_sending > 10 on these nodes. It guarantees that the rebalancer won't send more than 100 buckets at once in the whole cluster. Take into account, that if max_sending is set too high with too low max_receiving, then some buckets will try to relocate, will fail with that, it will consume network resources and time. It is important to configure these parameters not contradicting to each other.
- Loading branch information
Showing
11 changed files
with
346 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
-- test-run result file version 2 | ||
test_run = require('test_run').new() | ||
| --- | ||
| ... | ||
|
||
REPLICASET_1 = { 'box_1_a', 'box_1_b' } | ||
| --- | ||
| ... | ||
REPLICASET_2 = { 'box_2_a', 'box_2_b' } | ||
| --- | ||
| ... | ||
REPLICASET_3 = { 'box_3_a', 'box_3_b' } | ||
| --- | ||
| ... | ||
REPLICASET_4 = { 'box_4_a', 'box_4_b' } | ||
| --- | ||
| ... | ||
engine = test_run:get_cfg('engine') | ||
| --- | ||
| ... | ||
|
||
test_run:create_cluster(REPLICASET_1, 'rebalancer') | ||
| --- | ||
| ... | ||
test_run:create_cluster(REPLICASET_2, 'rebalancer') | ||
| --- | ||
| ... | ||
test_run:create_cluster(REPLICASET_3, 'rebalancer') | ||
| --- | ||
| ... | ||
test_run:create_cluster(REPLICASET_4, 'rebalancer') | ||
| --- | ||
| ... | ||
util = require('util') | ||
| --- | ||
| ... | ||
util.wait_master(test_run, REPLICASET_1, 'box_1_a') | ||
| --- | ||
| ... | ||
util.wait_master(test_run, REPLICASET_2, 'box_2_a') | ||
| --- | ||
| ... | ||
util.wait_master(test_run, REPLICASET_3, 'box_3_a') | ||
| --- | ||
| ... | ||
util.wait_master(test_run, REPLICASET_4, 'box_4_a') | ||
| --- | ||
| ... | ||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2, REPLICASET_3, \ | ||
REPLICASET_4}, 'bootstrap_storage(\'%s\')', engine) | ||
| --- | ||
| ... | ||
|
||
-- | ||
-- The test is about parallel rebalancer. It is not very different | ||
-- from a normal rebalancer except the problem of max receiving | ||
-- bucket limit. Workers should correctly handle that, and of | ||
-- course rebalancing should never totally stop. | ||
-- | ||
|
||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'add_replicaset()') | ||
| --- | ||
| ... | ||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2, REPLICASET_3}, 'add_second_replicaset()') | ||
| --- | ||
| ... | ||
-- 4 replicasets, 1 sends to 3. It has 5 workers. It means, that | ||
-- throttling is inevitable. | ||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2, REPLICASET_3, REPLICASET_4}, [[\ | ||
cfg.rebalancer_max_receiving = 1\ | ||
vshard.storage.cfg(cfg, box.info.uuid)\ | ||
]]) | ||
| --- | ||
| ... | ||
|
||
test_run:switch('box_1_a') | ||
| --- | ||
| - true | ||
| ... | ||
vshard.storage.bucket_force_create(1, 200) | ||
| --- | ||
| - true | ||
| ... | ||
t1 = fiber.time() | ||
| --- | ||
| ... | ||
wait_rebalancer_state('The cluster is balanced ok', test_run) | ||
| --- | ||
| ... | ||
t2 = fiber.time() | ||
| --- | ||
| ... | ||
-- Rebalancing should not stop. It can be checked by watching if | ||
-- there was a sleep REBALANCER_WORK_INTERVAL (which is 10 | ||
-- seconds). | ||
(t2 - t1 < 10) or {t1, t2} | ||
| --- | ||
| - true | ||
| ... | ||
|
||
test_run:switch('default') | ||
| --- | ||
| - true | ||
| ... | ||
test_run:drop_cluster(REPLICASET_4) | ||
| --- | ||
| ... | ||
test_run:drop_cluster(REPLICASET_3) | ||
| --- | ||
| ... | ||
test_run:drop_cluster(REPLICASET_2) | ||
| --- | ||
| ... | ||
test_run:drop_cluster(REPLICASET_1) | ||
| --- | ||
| ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
test_run = require('test_run').new() | ||
|
||
REPLICASET_1 = { 'box_1_a', 'box_1_b' } | ||
REPLICASET_2 = { 'box_2_a', 'box_2_b' } | ||
REPLICASET_3 = { 'box_3_a', 'box_3_b' } | ||
REPLICASET_4 = { 'box_4_a', 'box_4_b' } | ||
engine = test_run:get_cfg('engine') | ||
|
||
test_run:create_cluster(REPLICASET_1, 'rebalancer') | ||
test_run:create_cluster(REPLICASET_2, 'rebalancer') | ||
test_run:create_cluster(REPLICASET_3, 'rebalancer') | ||
test_run:create_cluster(REPLICASET_4, 'rebalancer') | ||
util = require('util') | ||
util.wait_master(test_run, REPLICASET_1, 'box_1_a') | ||
util.wait_master(test_run, REPLICASET_2, 'box_2_a') | ||
util.wait_master(test_run, REPLICASET_3, 'box_3_a') | ||
util.wait_master(test_run, REPLICASET_4, 'box_4_a') | ||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2, REPLICASET_3, \ | ||
REPLICASET_4}, 'bootstrap_storage(\'%s\')', engine) | ||
|
||
-- | ||
-- The test is about parallel rebalancer. It is not very different | ||
-- from a normal rebalancer except the problem of max receiving | ||
-- bucket limit. Workers should correctly handle that, and of | ||
-- course rebalancing should never totally stop. | ||
-- | ||
|
||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'add_replicaset()') | ||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2, REPLICASET_3}, 'add_second_replicaset()') | ||
-- 4 replicasets, 1 sends to 3. It has 5 workers. It means, that | ||
-- throttling is inevitable. | ||
util.map_evals(test_run, {REPLICASET_1, REPLICASET_2, REPLICASET_3, REPLICASET_4}, [[\ | ||
cfg.rebalancer_max_receiving = 1\ | ||
vshard.storage.cfg(cfg, box.info.uuid)\ | ||
]]) | ||
|
||
test_run:switch('box_1_a') | ||
vshard.storage.bucket_force_create(1, 200) | ||
t1 = fiber.time() | ||
wait_rebalancer_state('The cluster is balanced ok', test_run) | ||
t2 = fiber.time() | ||
-- Rebalancing should not stop. It can be checked by watching if | ||
-- there was a sleep REBALANCER_WORK_INTERVAL (which is 10 | ||
-- seconds). | ||
(t2 - t1 < 10) or {t1, t2} | ||
|
||
test_run:switch('default') | ||
test_run:drop_cluster(REPLICASET_4) | ||
test_run:drop_cluster(REPLICASET_3) | ||
test_run:drop_cluster(REPLICASET_2) | ||
test_run:drop_cluster(REPLICASET_1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.