You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(I think this bug applies to all ZFS platforms, not just ZFS On Linux.)
There is a legitimate use case for having vdevs of unequal size in a pool, if the user only has a heterogeneous set of disks on hand. It sounds like ZFS should be able to handle this just fine, but in practice it doesn't: such a pool will become a ticking time bomb that will start behaving miserably as soon as a significant amount of data is poured into it.
$ dd if=/dev/zero of=/testspace/d2 bs=1M count=384 # write 384M to pool
d1 is 5 times smaller than d2. So one might reasonably expect ZFS to do The Right Thing and simply allocate 76M on d1 and 308M on d2, keeping both vdevs usage ratio constant (~60% in this case). Unfortunately, we are disappointed:
It all goes downhill from there. Performance goes out the window as most of the future writes will end up only being written to one vdev d2, since it's the only one that has space left. Adding insult to injury, ZFS desperately persists in trying to allocate blocks from d1 despite the fact that it's already 97% full, which is crazy and results in huge slowdowns because the allocator is struggling to find free space on d1.
This behavior is highly surprising and I strongly believe it should be considered a bug; a quick Google search reveals no such warnings about using heterogeneous vdevs in a zpool, which means the user will probably only discover this when it's already too late, since the issues only start cropping up after the pool is populated with a large amount of data.
I suspect the problem lies in the metaslab allocator bias code. The comments on that code sound like it's supposed to equalize vdev utilization, but that only seems to work out for the "new empty vdev" use case, not for the "unequal vdev capacities" use case. I suspect the math used in that code doesn't actually work out in the long term when vdev capacities are different.
The text was updated successfully, but these errors were encountered:
(I think this bug applies to all ZFS platforms, not just ZFS On Linux.)
There is a legitimate use case for having vdevs of unequal size in a pool, if the user only has a heterogeneous set of disks on hand. It sounds like ZFS should be able to handle this just fine, but in practice it doesn't: such a pool will become a ticking time bomb that will start behaving miserably as soon as a significant amount of data is poured into it.
Steps to reproduce:
So far so good. Here's the ticking time bomb:
d1
is 5 times smaller thand2
. So one might reasonably expect ZFS to do The Right Thing and simply allocate 76M ond1
and 308M ond2
, keeping both vdevs usage ratio constant (~60% in this case). Unfortunately, we are disappointed:It all goes downhill from there. Performance goes out the window as most of the future writes will end up only being written to one vdev
d2
, since it's the only one that has space left. Adding insult to injury, ZFS desperately persists in trying to allocate blocks fromd1
despite the fact that it's already 97% full, which is crazy and results in huge slowdowns because the allocator is struggling to find free space ond1
.This behavior is highly surprising and I strongly believe it should be considered a bug; a quick Google search reveals no such warnings about using heterogeneous vdevs in a zpool, which means the user will probably only discover this when it's already too late, since the issues only start cropping up after the pool is populated with a large amount of data.
I suspect the problem lies in the metaslab allocator bias code. The comments on that code sound like it's supposed to equalize vdev utilization, but that only seems to work out for the "new empty vdev" use case, not for the "unequal vdev capacities" use case. I suspect the math used in that code doesn't actually work out in the long term when vdev capacities are different.
The text was updated successfully, but these errors were encountered: