New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large Block #2865
Large Block #2865
Conversation
This change is expected to be merged after the kmem-rework changes. I've tested it up to 8M block sizes and it has help together quite well. |
c2438cb
to
dd078c8
Compare
7359b09
to
d62632e
Compare
Refreshed against the latest master, testing continues to go well. Changes include:
|
d62632e
to
cff8fdc
Compare
I'm testing this feature, but find that when I set |
@bjin Thanks for the additional testing. I'm not able to case there issue you're describing with the current patch stack. Could you perhaps rebuild everything cleanly to verify what you're seeing. |
Looks like it's due to |
@bjin Right, that makes sense. When |
Everything works fine. It's not an exhaustive filesystem testing, but I run several network application (w/ mostly random disk IO) for hours without issues. |
Thus far I've observed the following two ASSERTS while stress testing these changes. It's not 100% clear that the large block patches introduced these problems, however thus far I haven't been able to reproduce them on master. Before this work can be merged these fails needs to be resolved.
For the above dnode_sync() case the specific values for this failure were |
@behlendorf regarding the 2 assertion failures you saw, the 1st, in arc_write_done(), might be fixed by removing these 2 lines from zio_write_bp_init(): - zio->io_bp_override = NULL;
- BP_ZERO(bp); We saw a similar failure and that was the fix. Note that this is unrelated to large block support. |
@behlendorf Is that possible to merge this into 0.6.4 if these two assertion failure are proven to be unrelated to changes from this pull request? I'm planing creating a large raidz array with 4k sector disks, and large block feature will help avoiding space overhead and improving overall performance. |
@ahrens thanks for the hint! @bjin before I can merge the large block changes we need to explain these assertion failures. It sounds like the arc_write_done() one is a long standing issue which we can address in another patch. The only way I've been able to reproduce either of these issues (and it isn't easy) are with some stress tests put together by @nedbass. If you can hit the dnode_sync() issue using just the master branch then we could be confident that these changes aren't introducing this bug. |
I have a feature request for this PR: to allow users to use bootfs (rpool) pools with large blocks when GRUB isn't interacting with ZFS directly. Aside from systems converted from FreeBSD, we dont have anything in our environment, or client environments, thats sitting on physical disks - most everything we do in production is beholden to some compliance regulations and we have to encrypt the substrate (since Oracle refuses to return any response on inquiries about the legal state of the possible leak or willful disclosure of what Rogue keeps updating so graciously). Since we're using dm-crypt like this, the /boot partitions are not ZFS anyway, and initrd handles loading the zpool after decrypting the underlying volumes. Until the OpenZFS community figures out how we want to deal with ZFS-native cryptography, key management, and all the fun alongside those pieces, we're going to have to keep doing this to ensure cross platform compat and physical security, while the pools are only growing. While most of the dedicated storage platforms we build do not use an rpool, we're starting to do a reasonable number of application and intermediate systems on ZFS which benefit from a rational implementation of OS snapshots (without the BTRFS "feature" of almost guaranteed data loss or LVM's allocation requirements and shaky restores), and it really wouldn't hurt to have ARC cache appropriate OS blocks in MFU on storage systems since the OS volumes are often encrypted USB sticks/flash cards. |
3654 zdb should print number of ganged blocks 3656 remove unused function zap_cursor_move_to_key() Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3654 https://www.illumos.org/issues/3656 illumos/illumos-gate@d5ee8a1 Porting Notes: 3655 and 3657 were part of this commit but those hunks were dropped since they apply to mdb. Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
3897 zfs filesystem and snapshot limits Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3897 illumos/illumos-gate@a2afb61 Porting Notes: The nm and dsname variables in dsl_dataset_snapshot_check() were relocated from the stack to the heap. This was flagged by gcc because it increased the stack frame size to >4K. Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
3897 zfs filesystem and snapshot limits (fix leak) Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3897 illumos/illumos-gate@fb7001f Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4901 zfs filesystem/snapshot limit leaks Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4901 illumos/illumos-gate@adf3407 Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4373 add block contents print to zstreamdump Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/4373 illumos/illumos-gate@994fb6b Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4951 ZFS administrative commands should use reserved space, not with ENOSPC Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4373 illumos/illumos-gate@7d46dc6 Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4951 ZFS administrative commands should use reserved space, not fail with ENOSPC Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/4951 illumos/illumos-gate@c39f2c8 Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
@kpande absolutely, refreshed against master. This patch is expected to be merged right after the tag. The test failures which were observed above have been resolved in master and were unrelated to this change. I've dropped the patch to consolidate the data and meta data caches from this patch stack because this change and the ABD patches are both expected to be in the next tag making it unnecessary. |
@kpande That's not going to be easy until one of them gets merged and the other rebased on to it. For now you may need to just test one or the other. |
5027 zfs large block support Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5027 illumos/illumos-gate@b515258 Porting Notes: Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. Additionally, unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on ZVOLs. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Improve the large block feature test coverage by extending ztest to frequently change the recordsize. This is specificially designed to catch corner cases which might otherwise go unnoticed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Refreshed with the 128K limit on ZVOLs removed. Multiple people have requested that this limit be removed since it was somewhat arbitrary. It's still not 100% clear that blocks larger than 128K are a good idea for ZVOLs due to the read-modify-write overhead but removing the limit at least allows us to test it. |
@kpande Thanks for the additional testing and sign-off. I hope to get this work finalized and merged next week. |
Please be aware of this issue https://www.illumos.org/issues/5393 when you import large blocks support. |
@avg-I thanks for calling that additional patch out. We'll merge it at the same time as these changes. Out of curiosity do you know of any other issues with the large block patches? I ask because I may have observed issues when testing with block sizes over 1MB. |
@behlendorf I haven't actually used larger blocks yet, just had the feature enabled. |
Refreshed against master and replaced by #3363. |
Refreshed support for large blocks based on the OpenZFS implementing which is currently undergoing review and testing. This patch stack passes all automated testing and basic functional testing. This patch stack would benefit from additional manual testing. See the upstream Illumos review for additional details.
https://reviews.csiden.org/r/51/