Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk space is not properly allocated when using unequal size vdevs #3389

Closed
dechamps opened this issue May 9, 2015 · 1 comment
Closed

Comments

@dechamps
Copy link
Contributor

dechamps commented May 9, 2015

(I think this bug applies to all ZFS platforms, not just ZFS On Linux.)

There is a legitimate use case for having vdevs of unequal size in a pool, if the user only has a heterogeneous set of disks on hand. It sounds like ZFS should be able to handle this just fine, but in practice it doesn't: such a pool will become a ticking time bomb that will start behaving miserably as soon as a significant amount of data is poured into it.

Steps to reproduce:

$ dd if=/dev/zero of=/tmp/d1 bs=1 count=1 seek=134217728 # create 128M file
$ dd if=/dev/zero of=/tmp/d2 bs=1 count=1 seek=536870912 # create 512M file
$ zpool create testspace /tmp/d1 /tmp/d2
$ zpool iostat -v testspace
                                                   capacity
pool                                             alloc   free
-----------------------------------------------  -----  -----
testspace                                        68.5K   627M
  /tmp/d1                                          26K   123M
  /tmp/d2                                        42.5K   504M
-----------------------------------------------  -----  -----

So far so good. Here's the ticking time bomb:

 $ dd if=/dev/zero of=/testspace/d2 bs=1M count=384 # write 384M to pool

d1 is 5 times smaller than d2. So one might reasonably expect ZFS to do The Right Thing and simply allocate 76M on d1 and 308M on d2, keeping both vdevs usage ratio constant (~60% in this case). Unfortunately, we are disappointed:

 $ zpool iostat -v testspace
                                                    capacity
 pool                                             alloc   free
 -----------------------------------------------  -----  -----
 testspace                                         379M   248M
   /tmp/d1                                         119M  4.14M
   /tmp/d2                                         261M   243M
 -----------------------------------------------  -----  -----

It all goes downhill from there. Performance goes out the window as most of the future writes will end up only being written to one vdev d2, since it's the only one that has space left. Adding insult to injury, ZFS desperately persists in trying to allocate blocks from d1 despite the fact that it's already 97% full, which is crazy and results in huge slowdowns because the allocator is struggling to find free space on d1.

This behavior is highly surprising and I strongly believe it should be considered a bug; a quick Google search reveals no such warnings about using heterogeneous vdevs in a zpool, which means the user will probably only discover this when it's already too late, since the issues only start cropping up after the pool is populated with a large amount of data.

I suspect the problem lies in the metaslab allocator bias code. The comments on that code sound like it's supposed to equalize vdev utilization, but that only seems to work out for the "new empty vdev" use case, not for the "unequal vdev capacities" use case. I suspect the math used in that code doesn't actually work out in the long term when vdev capacities are different.

@dechamps
Copy link
Contributor Author

I wrote a fix in #3391.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant