zvol on RAIDZ2 takes up double the expected space #1807
Fedora 19, zfs 0.6.2 release, 6 disks in RAIDZ2 with SSD as L2ARC device.
Allocated pool space before experiment:
Create a 10G zvol:
Fully write the volume once:
Allocated pool space after write:
As can be seen the difference is 32718569472 bytes (~30GB), double the expected amount of ~15GB.
"zfs get all" output of the zvol after writing:
NAME PROPERTY VALUE SOURCE tank/backup/test type volume - tank/backup/test creation Tue Oct 22 23:12 2013 - tank/backup/test used 20.3G - tank/backup/test available 4.88T - tank/backup/test referenced 20.3G - tank/backup/test compressratio 1.00x - tank/backup/test reservation none default tank/backup/test volsize 10G local tank/backup/test volblocksize 8K - tank/backup/test checksum on default tank/backup/test compression off local tank/backup/test readonly off default tank/backup/test copies 1 default tank/backup/test refreservation 10.3G local tank/backup/test primarycache all default tank/backup/test secondarycache metadata inherited from tank/backup tank/backup/test usedbysnapshots 0 - tank/backup/test usedbydataset 20.3G - tank/backup/test usedbychildren 0 - tank/backup/test usedbyrefreservation 0 - tank/backup/test logbias latency default tank/backup/test dedup off default tank/backup/test mlslabel none default tank/backup/test sync standard default tank/backup/test refcompressratio 1.00x - tank/backup/test written 20.3G - tank/backup/test snapdev hidden default
As a second test a 10GB file was created on a zfs filesystem on the same pool.
Allocated pool space before file creation:
Allocated pool space after file creation:
The difference is 16138665984 bytes (~15GB), the expected amount.
So here's the thing. Each 8k volume block has its own parity disks. That's how RAID-Z works. If you have 4k AF disks (zdb shows "ashift: 12") then what happens is each 8k block consists of 2x disks of data, and 2x disks of parity. You're expecting that generally there's 4x disks of data and 2x disks of parity all the way across the pool. There isn't.
Switch to using a block size of 16k or larger and you should be in better shape for space usage, at the expense of having a worse read-copy-write cycle for when smaller writes occur on the volume. With a filesystem's default of 128k you're already covered there.