Allow to increase/decrease bucket count #232

Gerold103 · 2020-04-07T12:15:34Z

Virtual sharding over buckets has a couple of strong sides: data locality and constant shard function. That significantly speeds up and simplifies rebalancing, and linked data request.
But this is based on having constant bucket count - this is the only guarantee of constant shard function. Or actually it was.
There is a way how to dynamically change number of actually stored buckets, proposed by @kostja and @alyapunov.
It is proposed to calculate shard function just like now, but dynamically change number of bits, used in that function. For example, assume we use 10 bits of a shard function value of 64 bits.
It gives 1024 buckets. Now assume it becomes not enough, a user wants more. He says - new bucket count is 2048. And we start using 11 bits. The existing buckets will be split in 2 new buckets each, and spread over the cluster by rebalancer.
When this is also not enough, 12 bits are used, and so on.
In theory, bucket count decrease should happen just like the reversed algorithm above. We need to find which buckets are adjacent and merge them.
On every storage it should be saved how many bits are used.
Probably that can be even applied to individual buckets, if we store bit count for every bucket. This needs to be checked if this does not break rule 'one shard function value = bucket'.

Gerold103 added storage feature A new functionality complicated labels Apr 7, 2020

Gerold103 added this to the 0.3 milestone Apr 7, 2020

kyukhin added the teamS Scaling label Sep 17, 2021

R-omk mentioned this issue Aug 1, 2022

Make default bucket count 30k #362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to increase/decrease bucket count #232

Allow to increase/decrease bucket count #232

Gerold103 commented Apr 7, 2020

Allow to increase/decrease bucket count #232

Allow to increase/decrease bucket count #232

Comments

Gerold103 commented Apr 7, 2020