Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Block #2865

Closed
wants to merge 9 commits into from
Closed

Large Block #2865

wants to merge 9 commits into from

Conversation

behlendorf
Copy link
Contributor

Refreshed support for large blocks based on the OpenZFS implementing which is currently undergoing review and testing. This patch stack passes all automated testing and basic functional testing. This patch stack would benefit from additional manual testing. See the upstream Illumos review for additional details.

https://reviews.csiden.org/r/51/

@behlendorf behlendorf added this to the 0.7.0 milestone Nov 4, 2014
@behlendorf behlendorf added the Type: Feature Feature request or new feature label Nov 4, 2014
@adilger adilger mentioned this pull request Nov 17, 2014
@behlendorf
Copy link
Contributor Author

This change is expected to be merged after the kmem-rework changes. I've tested it up to 8M block sizes and it has help together quite well.

@behlendorf
Copy link
Contributor Author

Refreshed against the latest master, testing continues to go well. Changes include:

  • Addition of the ZFS filesystem/snapshop patches which were omitted in the initial port. These aren't related to large block but pulling them in first minimizes merge conflicts.
  • Merged zio_buf_cache and zio_data_buf_cache as a Linux optimization.
  • Ztest patch to change the recordsize every 1s during operation to improve test coverage.
  • Exported spa_blocksize()
  • Xattrs stored in spill blocks limited to 128K for now.

@behlendorf behlendorf changed the title Large Block (WIP) Large Block Jan 29, 2015
@bjin
Copy link

bjin commented Feb 3, 2015

I'm testing this feature, but find that when I set zfs set recordsize=1m zroot, zfs complains with message internal error: Numerical argument out of domain. But settings zfs set recordsize=1m zroot/home works fine as showed by zfs get recordsize. Is there any reason regarding this? Is it intended?

@behlendorf
Copy link
Contributor Author

@bjin Thanks for the additional testing. I'm not able to case there issue you're describing with the current patch stack. Could you perhaps rebuild everything cleanly to verify what you're seeing.

@bjin
Copy link

bjin commented Feb 4, 2015

Looks like it's due to zroot being flagged as bootfs in zpool properties, so I guess it's intended behavior. (Although the error message is bit weird)

@behlendorf
Copy link
Contributor Author

@bjin Right, that makes sense. When bootfs is set it's inadvisable to enable this because utilities such as GRUB would need to be updated to understand the larger blocks. You're right we could use a better error message. Aside from this have you noticed any issues?

@bjin
Copy link

bjin commented Feb 4, 2015

Everything works fine. It's not an exhaustive filesystem testing, but I run several network application (w/ mostly random disk IO) for hours without issues.

@behlendorf behlendorf modified the milestones: 0.6.5, 0.6.4 Feb 4, 2015
@behlendorf
Copy link
Contributor Author

Thus far I've observed the following two ASSERTS while stress testing these changes. It's not 100% clear that the large block patches introduced these problems, however thus far I haven't been able to reproduce them on master. Before this work can be merged these fails needs to be resolved.

VERIFY(((((zio->io_bp)->blk_prop) >> (62)) & ((1ULL << (1)) - 1))) failed
PANIC at arc.c:3828:arc_write_done()
VERIFY(0) failed
PANIC at dnode_sync.c:621:dnode_sync()

For the above dnode_sync() case the specific values for this failure were 1 > 1 || 0 || 0 || 266240 == 335872.

@sempervictus sempervictus mentioned this pull request Feb 5, 2015
@ahrens
Copy link
Member

ahrens commented Feb 6, 2015

@behlendorf regarding the 2 assertion failures you saw, the 1st, in arc_write_done(), might be fixed by removing these 2 lines from zio_write_bp_init():

-        zio->io_bp_override = NULL;
-        BP_ZERO(bp);

We saw a similar failure and that was the fix. Note that this is unrelated to large block support.

@bjin
Copy link

bjin commented Feb 12, 2015

@behlendorf Is that possible to merge this into 0.6.4 if these two assertion failure are proven to be unrelated to changes from this pull request? I'm planing creating a large raidz array with 4k sector disks, and large block feature will help avoiding space overhead and improving overall performance.

@behlendorf
Copy link
Contributor Author

@ahrens thanks for the hint!

@bjin before I can merge the large block changes we need to explain these assertion failures. It sounds like the arc_write_done() one is a long standing issue which we can address in another patch.
But the dnode_sync() one in particular needs to be explained because it indicates the block pointer being written isn't valid. I don't see how the large block changes could have caused this, and it's certainly possible this isn't a new regression but we need to find out.

The only way I've been able to reproduce either of these issues (and it isn't easy) are with some stress tests put together by @nedbass. If you can hit the dnode_sync() issue using just the master branch then we could be confident that these changes aren't introducing this bug.

https://github.com/nedbass/zfsstress

@sempervictus
Copy link
Contributor

I have a feature request for this PR: to allow users to use bootfs (rpool) pools with large blocks when GRUB isn't interacting with ZFS directly.

Aside from systems converted from FreeBSD, we dont have anything in our environment, or client environments, thats sitting on physical disks - most everything we do in production is beholden to some compliance regulations and we have to encrypt the substrate (since Oracle refuses to return any response on inquiries about the legal state of the possible leak or willful disclosure of what Rogue keeps updating so graciously). Since we're using dm-crypt like this, the /boot partitions are not ZFS anyway, and initrd handles loading the zpool after decrypting the underlying volumes.

Until the OpenZFS community figures out how we want to deal with ZFS-native cryptography, key management, and all the fun alongside those pieces, we're going to have to keep doing this to ensure cross platform compat and physical security, while the pools are only growing. While most of the dedicated storage platforms we build do not use an rpool, we're starting to do a reasonable number of application and intermediate systems on ZFS which benefit from a rational implementation of OS snapshots (without the BTRFS "feature" of almost guaranteed data loss or LVM's allocation requirements and shaky restores), and it really wouldn't hurt to have ARC cache appropriate OS blocks in MFU on storage systems since the OS volumes are often encrypted USB sticks/flash cards.

ahrens and others added 3 commits March 30, 2015 11:24
3654 zdb should print number of ganged blocks
3656 remove unused function zap_cursor_move_to_key()
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/3654
  https://www.illumos.org/issues/3656
  illumos/illumos-gate@d5ee8a1

Porting Notes:

3655 and 3657 were part of this commit but those hunks were dropped
since they apply to mdb.

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
3897 zfs filesystem and snapshot limits
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3897
  illumos/illumos-gate@a2afb61

Porting Notes:

The nm and dsname variables in dsl_dataset_snapshot_check() were
relocated from the stack to the heap.  This was flagged by gcc
because it increased the stack frame size to >4K.

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
3897 zfs filesystem and snapshot limits (fix leak)
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3897
  illumos/illumos-gate@fb7001f

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
jjelinek and others added 4 commits March 30, 2015 11:24
4901 zfs filesystem/snapshot limit leaks
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/4901
  illumos/illumos-gate@adf3407

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4373 add block contents print to zstreamdump
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  https://www.illumos.org/issues/4373
  illumos/illumos-gate@994fb6b

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4951 ZFS administrative commands should use reserved space, not with ENOSPC
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4373
  illumos/illumos-gate@7d46dc6

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
4951 ZFS administrative commands should use reserved space, not fail with ENOSPC
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/4951
  illumos/illumos-gate@c39f2c8

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf
Copy link
Contributor Author

@kpande absolutely, refreshed against master. This patch is expected to be merged right after the tag. The test failures which were observed above have been resolved in master and were unrelated to this change. I've dropped the patch to consolidate the data and meta data caches from this patch stack because this change and the ABD patches are both expected to be in the next tag making it unnecessary.

@behlendorf
Copy link
Contributor Author

@kpande That's not going to be easy until one of them gets merged and the other rebased on to it. For now you may need to just test one or the other.

ahrens and others added 2 commits March 30, 2015 14:36
5027 zfs large block support
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5027
  illumos/illumos-gate@b515258

Porting Notes:

Included in this patch is a tiny ISP2() cleanup in zio_init() from
Illumos 5255.  Additionally, unlike the upstream Illumos commit this
patch does not impose an arbitrary 128K block size limit on ZVOLs.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Improve the large block feature test coverage by extending ztest
to frequently change the recordsize.  This is specificially designed
to catch corner cases which might otherwise go unnoticed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf
Copy link
Contributor Author

Refreshed with the 128K limit on ZVOLs removed. Multiple people have requested that this limit be removed since it was somewhat arbitrary. It's still not 100% clear that blocks larger than 128K are a good idea for ZVOLs due to the read-modify-write overhead but removing the limit at least allows us to test it.

@behlendorf
Copy link
Contributor Author

@kpande Thanks for the additional testing and sign-off. I hope to get this work finalized and merged next week.

@avg-I
Copy link
Contributor

avg-I commented Apr 24, 2015

Please be aware of this issue https://www.illumos.org/issues/5393 when you import large blocks support.

@behlendorf
Copy link
Contributor Author

@avg-I thanks for calling that additional patch out. We'll merge it at the same time as these changes. Out of curiosity do you know of any other issues with the large block patches? I ask because I may have observed issues when testing with block sizes over 1MB.

@avg-I
Copy link
Contributor

avg-I commented Apr 24, 2015

@behlendorf I haven't actually used larger blocks yet, just had the feature enabled.

@behlendorf
Copy link
Contributor Author

Refreshed against master and replaced by #3363.

@behlendorf behlendorf closed this Apr 30, 2015
@behlendorf behlendorf deleted the largeblock branch April 19, 2021 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants