You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Virtual sharding over buckets has a couple of strong sides: data locality and constant shard function. That significantly speeds up and simplifies rebalancing, and linked data request.
But this is based on having constant bucket count - this is the only guarantee of constant shard function. Or actually it was.
There is a way how to dynamically change number of actually stored buckets, proposed by @kostja and @alyapunov.
It is proposed to calculate shard function just like now, but dynamically change number of bits, used in that function. For example, assume we use 10 bits of a shard function value of 64 bits.
It gives 1024 buckets. Now assume it becomes not enough, a user wants more. He says - new bucket count is 2048. And we start using 11 bits. The existing buckets will be split in 2 new buckets each, and spread over the cluster by rebalancer.
When this is also not enough, 12 bits are used, and so on.
In theory, bucket count decrease should happen just like the reversed algorithm above. We need to find which buckets are adjacent and merge them.
On every storage it should be saved how many bits are used.
Probably that can be even applied to individual buckets, if we store bit count for every bucket. This needs to be checked if this does not break rule 'one shard function value = bucket'.
The text was updated successfully, but these errors were encountered:
Virtual sharding over buckets has a couple of strong sides: data locality and constant shard function. That significantly speeds up and simplifies rebalancing, and linked data request.
But this is based on having constant bucket count - this is the only guarantee of constant shard function. Or actually it was.
There is a way how to dynamically change number of actually stored buckets, proposed by @kostja and @alyapunov.
It is proposed to calculate shard function just like now, but dynamically change number of bits, used in that function. For example, assume we use 10 bits of a shard function value of 64 bits.
It gives 1024 buckets. Now assume it becomes not enough, a user wants more. He says - new bucket count is 2048. And we start using 11 bits. The existing buckets will be split in 2 new buckets each, and spread over the cluster by rebalancer.
When this is also not enough, 12 bits are used, and so on.
In theory, bucket count decrease should happen just like the reversed algorithm above. We need to find which buckets are adjacent and merge them.
On every storage it should be saved how many bits are used.
Probably that can be even applied to individual buckets, if we store bit count for every bucket. This needs to be checked if this does not break rule 'one shard function value = bucket'.
The text was updated successfully, but these errors were encountered: