New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRIM/Discard support from Nexenta #3656

Closed
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
@dweeezil
Member

dweeezil commented Aug 2, 2015

This patch stack includes Nextenta's support for TRIM/Discard on disk and file vdevs as well as an update to the dkio headers for appropriate Solaris compatibility. It requires the current https://github.com/dweeezil/spl/tree/ntrim patch in order to compile properly.

The usual disclaimers apply at this point: I've performed moderate testing with ext4-backed file vdevs and light testing with SSD-backed disk vdevs and it appears to work properly. Use at your own risk. It may DESTROY YOUR DATA! I'm posting the pull request because it seems to work during initial testing and I'd like the buildbots to get a chance at it (which I'm expecting to fail unless they use the corresponding SPL code).

The initial TRIM support (currently in commit 719301c) caused frequent deadlocks in ztest due to the SCL_ALL spa locking during the trim operations. The follow-on patch to support on-demand trim changed the locking scheme and I'm no longer seeing deadlocks with either ztest or normal operation.

The final last commit (currently 9e5cfd7) adds ZIL logging for zvol trim operations. This code was mostly borrowed from an older Nexenta patch (referenced in the commit log) and has been merged into the existing zvol trim function.

In order to enable the feature, you must use zpool set autotrim=on on the pool and the zfs_trim module parameter must be set to 1 (which is its default value). The zfs_trim parameter controls the lower-level vdev trimming whereas the pool property controls it at a higher level. By default, trims are batched and only applied every 32 transaction groups as controlled by the new zfs_txgs_per_trim parameter. This allows for zpool import -T to continue to be useful. Finally, by default, only regions of at least 1MiB are trimmed as set by the zfs_trim_min_ext_sz module parameter.

@dweeezil dweeezil changed the title from WIP - TIM/Discard support from Nexenta to WIP - TRIM/Discard support from Nexenta Aug 2, 2015

@dweeezil dweeezil referenced this pull request Aug 2, 2015

Closed

SATA trim for vdev members #598

@edillmann

This comment has been minimized.

Show comment
Hide comment
@edillmann

edillmann Aug 4, 2015

Contributor

Hi @dweeezil

FYI: zpool trim rpool triggers the following warning

[ 61.844048] Large kmem_alloc(101976, 0x1000), please file an issue at:
[ 61.844048] https://github.com/zfsonlinux/zfs/issues/new
[ 61.844053] CPU: 3 PID: 392 Comm: spl_system_task Tainted: P OE 3.19.0 #4
[ 61.844054] Hardware name: Dell Inc. Dell Precision M3800/Dell Precision M3800, BIOS A07 10/14/2014
[ 61.844055] 000000000000c2d0 ffff88041557fc48 ffffffff81732ffb 0000000000000001
[ 61.844057] 0000000000000000 ffff88041557fc88 ffffffffa0209b73 ffff880400000000
[ 61.844059] 00000000000018e3 ffff880409ef9c00 ffff880409ef9c00 ffff8802c42d08c0
[ 61.844061] Call Trace:
[ 61.844068] [] dump_stack+0x45/0x57
[ 61.844091] [] spl_kmem_zalloc+0x113/0x180 [spl]
[ 61.844128] [] zio_trim+0x79/0x1b0 [zfs]
[ 61.844147] [] metaslab_exec_trim+0xa6/0xf0 [zfs]
[ 61.844166] [] metaslab_trim_all+0x10c/0x1a0 [zfs]
[ 61.844188] [] vdev_trim_all+0x13d/0x310 [zfs]
[ 61.844194] [] taskq_thread+0x205/0x450 [spl]
[ 61.844198] [] ? wake_up_state+0x20/0x20
[ 61.844203] [] ? taskq_cancel_id+0x120/0x120 [spl]
[ 61.844206] [] kthread+0xd2/0xf0
[ 61.844208] [] ? kthread_create_on_node+0x180/0x180
[ 61.844210] [] ret_from_fork+0x7c/0xb0
[ 61.844212] [] ? kthread_create_on_node+0x180/0x180

Contributor

edillmann commented Aug 4, 2015

Hi @dweeezil

FYI: zpool trim rpool triggers the following warning

[ 61.844048] Large kmem_alloc(101976, 0x1000), please file an issue at:
[ 61.844048] https://github.com/zfsonlinux/zfs/issues/new
[ 61.844053] CPU: 3 PID: 392 Comm: spl_system_task Tainted: P OE 3.19.0 #4
[ 61.844054] Hardware name: Dell Inc. Dell Precision M3800/Dell Precision M3800, BIOS A07 10/14/2014
[ 61.844055] 000000000000c2d0 ffff88041557fc48 ffffffff81732ffb 0000000000000001
[ 61.844057] 0000000000000000 ffff88041557fc88 ffffffffa0209b73 ffff880400000000
[ 61.844059] 00000000000018e3 ffff880409ef9c00 ffff880409ef9c00 ffff8802c42d08c0
[ 61.844061] Call Trace:
[ 61.844068] [] dump_stack+0x45/0x57
[ 61.844091] [] spl_kmem_zalloc+0x113/0x180 [spl]
[ 61.844128] [] zio_trim+0x79/0x1b0 [zfs]
[ 61.844147] [] metaslab_exec_trim+0xa6/0xf0 [zfs]
[ 61.844166] [] metaslab_trim_all+0x10c/0x1a0 [zfs]
[ 61.844188] [] vdev_trim_all+0x13d/0x310 [zfs]
[ 61.844194] [] taskq_thread+0x205/0x450 [spl]
[ 61.844198] [] ? wake_up_state+0x20/0x20
[ 61.844203] [] ? taskq_cancel_id+0x120/0x120 [spl]
[ 61.844206] [] kthread+0xd2/0xf0
[ 61.844208] [] ? kthread_create_on_node+0x180/0x180
[ 61.844210] [] ret_from_fork+0x7c/0xb0
[ 61.844212] [] ? kthread_create_on_node+0x180/0x180

Show outdated Hide outdated module/zfs/zio.c
@sempervictus

This comment has been minimized.

Show comment
Hide comment
@sempervictus

sempervictus Sep 9, 2015

Contributor

@dweeezil: sorry to be pain, but curious to know status on this - we've got a few SSD-only pools to play with for a few days before we stuff 'em into prod (making sure our hardware doesnt screw us), so we can do a bit of testing on this without losing production data if you happen to have some test paths for us to run through. Thanks

Contributor

sempervictus commented Sep 9, 2015

@dweeezil: sorry to be pain, but curious to know status on this - we've got a few SSD-only pools to play with for a few days before we stuff 'em into prod (making sure our hardware doesnt screw us), so we can do a bit of testing on this without losing production data if you happen to have some test paths for us to run through. Thanks

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Sep 9, 2015

Member

@sempervictus I'm actively looking for feedback. The patch does need to be refreshed against a current master codebase which I'll try to do today. There's a bit of interference with the recent zvol improvements.

In my own testing, the patch does appear to work properly, although the behavior of the TRIM "batching" needs a bit better documentation and, possibly, slightly different implementation (IIRC, one or both of the parameters only has certain effect upon module load and/or pool import). I'd also like to add some kstats to help monitor its behavior.

I've used the on-demand TRIM quite a bit and it seems to work perfectly. You can TRIM a pool with zpool trim <pool> and monitor its progress with zpool status (although I'd expect it to be pretty instantaneous on most SSDs). Most of my testing, however, has used file-based vdevs but I have also used real SSDs. I just got a couple of new SSDs today which I plan on using for a bit more extensive testing.

There's also a backport to 0.6.4.2 in a branch named "ntrim-0.6.4.2" ("ntrim-0.6.4.1" for SPL).

Member

dweeezil commented Sep 9, 2015

@sempervictus I'm actively looking for feedback. The patch does need to be refreshed against a current master codebase which I'll try to do today. There's a bit of interference with the recent zvol improvements.

In my own testing, the patch does appear to work properly, although the behavior of the TRIM "batching" needs a bit better documentation and, possibly, slightly different implementation (IIRC, one or both of the parameters only has certain effect upon module load and/or pool import). I'd also like to add some kstats to help monitor its behavior.

I've used the on-demand TRIM quite a bit and it seems to work perfectly. You can TRIM a pool with zpool trim <pool> and monitor its progress with zpool status (although I'd expect it to be pretty instantaneous on most SSDs). Most of my testing, however, has used file-based vdevs but I have also used real SSDs. I just got a couple of new SSDs today which I plan on using for a bit more extensive testing.

There's also a backport to 0.6.4.2 in a branch named "ntrim-0.6.4.2" ("ntrim-0.6.4.1" for SPL).

@sempervictus

This comment has been minimized.

Show comment
Hide comment
@sempervictus

sempervictus Sep 10, 2015

Contributor

Soon as this is updated to reflect changes in master we'll add it to our stack. One potential caveat is that we generally utilize dm-crypt with the discard option at mount time. Any thoughts on potential side effects from this? Has this sort of setup been tested in any way?

Contributor

sempervictus commented Sep 10, 2015

Soon as this is updated to reflect changes in master we'll add it to our stack. One potential caveat is that we generally utilize dm-crypt with the discard option at mount time. Any thoughts on potential side effects from this? Has this sort of setup been tested in any way?

@dweeezil dweeezil changed the title from WIP - TRIM/Discard support from Nexenta to TRIM/Discard support from Nexenta Sep 14, 2015

@edillmann

This comment has been minimized.

Show comment
Hide comment
@edillmann

edillmann Sep 14, 2015

Contributor

Hi @dweeezil ,

Just to let you know I have been running this pull rq since it was released, and beside the kmem_alloc warning, I did not see any problem or corruption on my test zpool (dual ssd mirror). The system has been crunching video camera recording for 2 months :-)

Is there any hope of having it rebased on master ?

Contributor

edillmann commented Sep 14, 2015

Hi @dweeezil ,

Just to let you know I have been running this pull rq since it was released, and beside the kmem_alloc warning, I did not see any problem or corruption on my test zpool (dual ssd mirror). The system has been crunching video camera recording for 2 months :-)

Is there any hope of having it rebased on master ?

@sempervictus

This comment has been minimized.

Show comment
Hide comment
@sempervictus

sempervictus Sep 14, 2015

Contributor

Tried to throw this into our stack today and noticed it has some conflicts with ABD in the raidz code. Rumor has it that should be merged "soon after the 0.6.5 tag" so i'm hoping by next rebase it'll be in there (nudge @behlendorf ) :).

Contributor

sempervictus commented Sep 14, 2015

Tried to throw this into our stack today and noticed it has some conflicts with ABD in the raidz code. Rumor has it that should be merged "soon after the 0.6.5 tag" so i'm hoping by next rebase it'll be in there (nudge @behlendorf ) :).

@edillmann

This comment has been minimized.

Show comment
Hide comment
@edillmann

edillmann Sep 14, 2015

Contributor

@dweeezil I didn't see it was already rebased, thanks.

Contributor

edillmann commented Sep 14, 2015

@dweeezil I didn't see it was already rebased, thanks.

@Mic92

This comment has been minimized.

Show comment
Hide comment
@Mic92

Mic92 Oct 18, 2015

Contributor

So SATA Trim is currently not supported, according to the comments in the source or is this handled by SPL properly?

Contributor

Mic92 commented Oct 18, 2015

So SATA Trim is currently not supported, according to the comments in the source or is this handled by SPL properly?

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Oct 18, 2015

Member

@Mic92 SATA TRIM works just fine and I've tested it plenty. The documentation is still the original from Illumos. TRIM will work on any block device vdev supporting BLKDISCARD or any file vdev on which the containing filesystem supports fallocate hole punching.

Member

dweeezil commented Oct 18, 2015

@Mic92 SATA TRIM works just fine and I've tested it plenty. The documentation is still the original from Illumos. TRIM will work on any block device vdev supporting BLKDISCARD or any file vdev on which the containing filesystem supports fallocate hole punching.

@greg-hydrogen

This comment has been minimized.

Show comment
Hide comment
@greg-hydrogen

greg-hydrogen Oct 18, 2015

@dweeezil - I would love to test this patch but it conflicts with the ABD branch (pull 3441) in vdev_raidz.c

I have the .rej file from both sides (patched abd first then this patch, and then patched this patch first then abd) if that helps at all

much appreciated

greg-hydrogen commented Oct 18, 2015

@dweeezil - I would love to test this patch but it conflicts with the ABD branch (pull 3441) in vdev_raidz.c

I have the .rej file from both sides (patched abd first then this patch, and then patched this patch first then abd) if that helps at all

much appreciated

@skiselkov

This comment has been minimized.

Show comment
Hide comment
@skiselkov

skiselkov Oct 29, 2015

Contributor

Hey guys, I wanted to get your take on the latest submission on this that we're trying to get upstreamed from Nexenta. I'd primarily like to make bottom end of the ZFS portion more accommodating to Linux & FreeBSD.
If you could, please drop by and take a look at https://reviews.csiden.org/r/263/

Contributor

skiselkov commented Oct 29, 2015

Hey guys, I wanted to get your take on the latest submission on this that we're trying to get upstreamed from Nexenta. I'd primarily like to make bottom end of the ZFS portion more accommodating to Linux & FreeBSD.
If you could, please drop by and take a look at https://reviews.csiden.org/r/263/

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Nov 4, 2015

Member

@greg-hydrogen I tried transplanting the relevant commits on to ABD awhile ago and other than the bio argument issues, the other main conflict is the logging I added to discards on zvols. The vdev conflicts you likely ran into are pretty easy to fix. I'll try to get an ABD-based version of this working within the next few days.

@skiselkov I'll check it out. It looks to be a port of the same Nexenta code in this pull request, correct?

Member

dweeezil commented Nov 4, 2015

@greg-hydrogen I tried transplanting the relevant commits on to ABD awhile ago and other than the bio argument issues, the other main conflict is the logging I added to discards on zvols. The vdev conflicts you likely ran into are pretty easy to fix. I'll try to get an ABD-based version of this working within the next few days.

@skiselkov I'll check it out. It looks to be a port of the same Nexenta code in this pull request, correct?

@skiselkov

This comment has been minimized.

Show comment
Hide comment
@skiselkov

skiselkov Nov 4, 2015

Contributor

@dweeezil It is indeed, with some minor updates & fixes.

Contributor

skiselkov commented Nov 4, 2015

@dweeezil It is indeed, with some minor updates & fixes.

@vaLski

This comment has been minimized.

Show comment
Hide comment
@vaLski

vaLski Nov 10, 2015

Can't compile it on CentOS 6.7. Attempted to install

spl-0.6.5.3 from github
zfs-0.6.5.3 from github + dweeezil:ntrim / #3656

Used the following part "If, instead you would like to use the GIT version, use the following commands instead:" from http://zfsonlinux.org/generic-rpm.html

On the last step make rpm-utils rpm-dkms

It fails with:

Preparing... ########################################### [100%]
1:zfs-dkms ########################################### [100%]
Removing old zfs-0.6.5 DKMS files...


Deleting module version: 0.6.5

completely from the DKMS tree.

Done.
Loading new zfs-0.6.5 DKMS files...
Building for 2.6.32-573.7.1.el6.x86_64
Building initial module for 2.6.32-573.7.1.el6.x86_64
Error! Bad return status for module build on kernel: 2.6.32-573.7.1.el6.x86_64 (x86_64)
Consult /var/lib/dkms/zfs/0.6.5/build/make.log for more information.
warning: %post(zfs-dkms-0.6.5-36_gafb9fad.el6.noarch) scriptlet failed, exit status 10

Log says

CC [M] /var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.o
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:36:33: error: sys/dkioc_free_util.h: No such file or directory
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c: In function 'vdev_disk_io_start':
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:693: error: 'DKIOCFREE' undeclared (first use in this function)
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:693: error: (Each undeclared identifier is reported only once
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:693: error: for each function it appears in.)
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:696: error: 'dkioc_free_list_t' undeclared (first use in this function)
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:696: error: 'dfl' undeclared (first use in this function)
make[5]: *** [/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.o] Error 1
make[4]: *** [/var/lib/dkms/zfs/0.6.5/build/module/zfs] Error 2
make[3]: *** [module/var/lib/dkms/zfs/0.6.5/build/module] Error 2
make[3]: Leaving directory /usr/src/kernels/2.6.32-573.7.1.el6.x86_64' make[2]: *** [modules] Error 2 make[2]: Leaving directory/var/lib/dkms/zfs/0.6.5/build/module'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/lib/dkms/zfs/0.6.5/build'
make: *** [all] Error 2

strace shows

[pid 20344] open("include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/src/kernels/2.6.32-573.7.1.el6.x86_64/arch/x86/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/var/lib/dkms/zfs/0.6.5/build/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/src/spl-0.6.5/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/src/spl-0.6.5/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] write(2, "sys/dkioc_free_util.h: No such f"..., 48) = 48

Files are at

find / -name dkioc_free_util.h
/usr/include/libspl/sys/dkioc_free_util.h
/usr/src/zfs-0.6.5/lib/libspl/include/sys/dkioc_free_util.h
/usr/src/zfs/lib/libspl/include/sys/dkioc_free_util.h
/var/lib/dkms/zfs/0.6.5/build/lib/libspl/include/sys/dkioc_free_util.h

Symlinked one of the "search" locations to dkioc_free_util.h but it started throwing other errors:

In file included from /var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:36:
/usr/src/kernels/2.6.32-573.7.1.el6.x86_64/arch/x86/include/sys/dkioc_free_util.h:25: error: expected ')' before '*' token

Attempted to solve those but to no avil.

It seems that dkioc_free_util.h is added by the "ntrim" branch as the dfl_free function is referenced in module/zfs/zio.c, module/zfs/vdev_raidz.c where trim is mentioned.

I really hope that this is the correct place to post this issue as it is directly related with this merge request.

Let me know if there is anything else I can assist with.

vaLski commented Nov 10, 2015

Can't compile it on CentOS 6.7. Attempted to install

spl-0.6.5.3 from github
zfs-0.6.5.3 from github + dweeezil:ntrim / #3656

Used the following part "If, instead you would like to use the GIT version, use the following commands instead:" from http://zfsonlinux.org/generic-rpm.html

On the last step make rpm-utils rpm-dkms

It fails with:

Preparing... ########################################### [100%]
1:zfs-dkms ########################################### [100%]
Removing old zfs-0.6.5 DKMS files...


Deleting module version: 0.6.5

completely from the DKMS tree.

Done.
Loading new zfs-0.6.5 DKMS files...
Building for 2.6.32-573.7.1.el6.x86_64
Building initial module for 2.6.32-573.7.1.el6.x86_64
Error! Bad return status for module build on kernel: 2.6.32-573.7.1.el6.x86_64 (x86_64)
Consult /var/lib/dkms/zfs/0.6.5/build/make.log for more information.
warning: %post(zfs-dkms-0.6.5-36_gafb9fad.el6.noarch) scriptlet failed, exit status 10

Log says

CC [M] /var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.o
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:36:33: error: sys/dkioc_free_util.h: No such file or directory
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c: In function 'vdev_disk_io_start':
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:693: error: 'DKIOCFREE' undeclared (first use in this function)
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:693: error: (Each undeclared identifier is reported only once
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:693: error: for each function it appears in.)
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:696: error: 'dkioc_free_list_t' undeclared (first use in this function)
/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:696: error: 'dfl' undeclared (first use in this function)
make[5]: *** [/var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.o] Error 1
make[4]: *** [/var/lib/dkms/zfs/0.6.5/build/module/zfs] Error 2
make[3]: *** [module/var/lib/dkms/zfs/0.6.5/build/module] Error 2
make[3]: Leaving directory /usr/src/kernels/2.6.32-573.7.1.el6.x86_64' make[2]: *** [modules] Error 2 make[2]: Leaving directory/var/lib/dkms/zfs/0.6.5/build/module'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/lib/dkms/zfs/0.6.5/build'
make: *** [all] Error 2

strace shows

[pid 20344] open("include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/src/kernels/2.6.32-573.7.1.el6.x86_64/arch/x86/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/var/lib/dkms/zfs/0.6.5/build/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/src/spl-0.6.5/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/src/spl-0.6.5/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] open("/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include/sys/dkioc_free_util.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
[pid 20344] write(2, "sys/dkioc_free_util.h: No such f"..., 48) = 48

Files are at

find / -name dkioc_free_util.h
/usr/include/libspl/sys/dkioc_free_util.h
/usr/src/zfs-0.6.5/lib/libspl/include/sys/dkioc_free_util.h
/usr/src/zfs/lib/libspl/include/sys/dkioc_free_util.h
/var/lib/dkms/zfs/0.6.5/build/lib/libspl/include/sys/dkioc_free_util.h

Symlinked one of the "search" locations to dkioc_free_util.h but it started throwing other errors:

In file included from /var/lib/dkms/zfs/0.6.5/build/module/zfs/vdev_disk.c:36:
/usr/src/kernels/2.6.32-573.7.1.el6.x86_64/arch/x86/include/sys/dkioc_free_util.h:25: error: expected ')' before '*' token

Attempted to solve those but to no avil.

It seems that dkioc_free_util.h is added by the "ntrim" branch as the dfl_free function is referenced in module/zfs/zio.c, module/zfs/vdev_raidz.c where trim is mentioned.

I really hope that this is the correct place to post this issue as it is directly related with this merge request.

Let me know if there is anything else I can assist with.

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Nov 10, 2015

Member

@vaLski I suspect you need the corresponding spl patch from https://github.com/dweeezil/spl/tree/ntrim which I just rebased to master as dweeezil/spl@305d417.

Member

dweeezil commented Nov 10, 2015

@vaLski I suspect you need the corresponding spl patch from https://github.com/dweeezil/spl/tree/ntrim which I just rebased to master as dweeezil/spl@305d417.

@vaLski

This comment has been minimized.

Show comment
Hide comment
@vaLski

vaLski Nov 10, 2015

@dweeezil sorry for overlooking this - you are right. Now everything is building just as expected. I will later report how the patch works for me. Initial testing seems fine.

vaLski commented Nov 10, 2015

@dweeezil sorry for overlooking this - you are right. Now everything is building just as expected. I will later report how the patch works for me. Initial testing seems fine.

@RichardSharpe

This comment has been minimized.

Show comment
Hide comment
@RichardSharpe

RichardSharpe Nov 21, 2015

Contributor

So, I have built this and we are testing it, but I encountered a problem.

If I run zpool trim on the same zpool twice things seem to get stuck in SPL and things don't work any more.

What can I do to debug this?

Contributor

RichardSharpe commented Nov 21, 2015

So, I have built this and we are testing it, but I encountered a problem.

If I run zpool trim on the same zpool twice things seem to get stuck in SPL and things don't work any more.

What can I do to debug this?

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Nov 21, 2015

Member

@RichardSharpe A good start would be to get some stack traces from the blocked processes. Generally they'll show up in your syslog after several minutes. Otherwise, typically "echo b > /proc/sysrq-trigger" will cause them to be generated.

As to the trigger for your problem, was the first trim still running when you ran the second one? This type of hang sounds suspiciously like a taskq dispatch problem. Hopefully the stack traces from the blocked processes will shed some light on it.

Member

dweeezil commented Nov 21, 2015

@RichardSharpe A good start would be to get some stack traces from the blocked processes. Generally they'll show up in your syslog after several minutes. Otherwise, typically "echo b > /proc/sysrq-trigger" will cause them to be generated.

As to the trigger for your problem, was the first trim still running when you ran the second one? This type of hang sounds suspiciously like a taskq dispatch problem. Hopefully the stack traces from the blocked processes will shed some light on it.

@tomposmiko

This comment has been minimized.

Show comment
Hide comment
@tomposmiko

tomposmiko Nov 21, 2015

@dweeezil

echo b will trigger a machine reboot:)

tomposmiko commented Nov 21, 2015

@dweeezil

echo b will trigger a machine reboot:)

@Mic92

This comment has been minimized.

Show comment
Hide comment
@Mic92

Mic92 Nov 21, 2015

Contributor

I could tests this on a SSD Raid. Would this actually make a difference implementation wise?

Contributor

Mic92 commented Nov 21, 2015

I could tests this on a SSD Raid. Would this actually make a difference implementation wise?

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Nov 21, 2015

Member

@tomposmiko Blush, I meant echo w > /dev/sysrq-trigger, of course (ugh, long day).

@Mic92 There is differing support code for the different types of vdevs. In particular, raidz has its own particular support. I have personally tested with raids, mirrors, stripes and file vdevs and all seemed to be properly trimmed.

As a reminder to anyone working with this, there are some new module parameters involved (as well as a new pool property which is also in the relevant man page):

       zfs_trim (int)
                   Controls whether the underlying vdevs of the pool are noti‐
                   fied when space is  freed  using  the  device-type-specific
                   command  set  (TRIM  here  being a general placeholder term
                   rather than referring to just the SATA TRIM command).  This
                   is frequently used on backing storage devices which support
                   thin provisioning or pre-erasure of blocks on flash media.

                   Default value: 0.

and

       zfs_trim_min_ext_sz (int)
                   Minimum size region in bytes over which  a  device-specific
                   TRIM  command  will  be  sent  to the underlying vdevs when
                   zfs_trim is set.

                   Default value: 1048576.

and

       zfs_txgs_per_trim (int)
                   Number of transaction  groups  over  which  device-specific
                   TRIM commands are batched when zfs_trim is set.

                   Default value: 32.

It's been awhile but IIRC, the latter 2 really ought to be set during module load time (with either kernel command-line parameters or with modprobe arguments after booting).

During my testing, after enabling the first one, of course, I generally set zfs_trim_min_ext_sz =0 so that even the smallest regions would be trimmed.

Also, IIRC, zfs_txgs_per_trim can not be set too low. Based on the existing logic, I think it makes not sense to set it lower than 2 (to "hurry along" the trim operations).

Member

dweeezil commented Nov 21, 2015

@tomposmiko Blush, I meant echo w > /dev/sysrq-trigger, of course (ugh, long day).

@Mic92 There is differing support code for the different types of vdevs. In particular, raidz has its own particular support. I have personally tested with raids, mirrors, stripes and file vdevs and all seemed to be properly trimmed.

As a reminder to anyone working with this, there are some new module parameters involved (as well as a new pool property which is also in the relevant man page):

       zfs_trim (int)
                   Controls whether the underlying vdevs of the pool are noti‐
                   fied when space is  freed  using  the  device-type-specific
                   command  set  (TRIM  here  being a general placeholder term
                   rather than referring to just the SATA TRIM command).  This
                   is frequently used on backing storage devices which support
                   thin provisioning or pre-erasure of blocks on flash media.

                   Default value: 0.

and

       zfs_trim_min_ext_sz (int)
                   Minimum size region in bytes over which  a  device-specific
                   TRIM  command  will  be  sent  to the underlying vdevs when
                   zfs_trim is set.

                   Default value: 1048576.

and

       zfs_txgs_per_trim (int)
                   Number of transaction  groups  over  which  device-specific
                   TRIM commands are batched when zfs_trim is set.

                   Default value: 32.

It's been awhile but IIRC, the latter 2 really ought to be set during module load time (with either kernel command-line parameters or with modprobe arguments after booting).

During my testing, after enabling the first one, of course, I generally set zfs_trim_min_ext_sz =0 so that even the smallest regions would be trimmed.

Also, IIRC, zfs_txgs_per_trim can not be set too low. Based on the existing logic, I think it makes not sense to set it lower than 2 (to "hurry along" the trim operations).

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Nov 21, 2015

Member

I'll push a refresh shortly to fix the compile problem with debug builds.

Member

dweeezil commented Nov 21, 2015

I'll push a refresh shortly to fix the compile problem with debug builds.

Wait for trimming to finish in metaslab_fini
The new spa_unload() code add as part of "OpenZFS 7303 - dynamic metaslab
selection" (4e21fd0) would cause in-flight trim zio to fail.  This patch
makes sure each metaslab is finished trimming before removing it during
metaslab shutdown.
@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Feb 20, 2017

Member

The latest commit, c7654b5, is rebased on a current master. Also, I've re-worded a bunch of the documentation in the spirit of @ahrens' suggestions. This should also fix panicking upon export which started occurring due to 4e21fd0.

Member

dweeezil commented Feb 20, 2017

The latest commit, c7654b5, is rebased on a current master. Also, I've re-worded a bunch of the documentation in the spirit of @ahrens' suggestions. This should also fix panicking upon export which started occurring due to 4e21fd0.

@sempervictus

This comment has been minimized.

Show comment
Hide comment
@sempervictus

sempervictus Feb 28, 2017

Contributor

Looks like we have a memory leak in the zpool trim command.
Reproducer showing different bytes are leaked when run without permissions:

#/bin/env ruby
require 'open3'
poolname = ARGV[0] || "tank"
while true
  Open3.capture3("zpool trim #{poolname}").tap {|o,e,i| puts e.index(':').to_s + ' bytes: ' + e[0..(e.index(':') -1)].each_byte.map { |b| b.to_s(16) }.join }
end

This is off the current revision, on Arch Linux in a Grsec/PAX environment using --with-pic=yes.
Out of 4287 executions, 4133 produce unique data...

Contributor

sempervictus commented Feb 28, 2017

Looks like we have a memory leak in the zpool trim command.
Reproducer showing different bytes are leaked when run without permissions:

#/bin/env ruby
require 'open3'
poolname = ARGV[0] || "tank"
while true
  Open3.capture3("zpool trim #{poolname}").tap {|o,e,i| puts e.index(':').to_s + ' bytes: ' + e[0..(e.index(':') -1)].each_byte.map { |b| b.to_s(16) }.join }
end

This is off the current revision, on Arch Linux in a Grsec/PAX environment using --with-pic=yes.
Out of 4287 executions, 4133 produce unique data...

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Feb 28, 2017

Member

@sempervictus The patch in 6c9f7af should fix this.

Member

dweeezil commented Feb 28, 2017

@sempervictus The patch in 6c9f7af should fix this.

@sempervictus

This comment has been minimized.

Show comment
Hide comment
@sempervictus

sempervictus Feb 28, 2017

Contributor

@dweeezil: thanks, will add in to next set. I've got this running on the current test stack and seeing some decent numbers for ZVOL performance atop and SSD with autotrim. If all goes well and it doesnt eat my data, i'll get this on some 10+-disk VDEV hardware soon enough. Any specific rough edges i should be testing?

Contributor

sempervictus commented Feb 28, 2017

@dweeezil: thanks, will add in to next set. I've got this running on the current test stack and seeing some decent numbers for ZVOL performance atop and SSD with autotrim. If all goes well and it doesnt eat my data, i'll get this on some 10+-disk VDEV hardware soon enough. Any specific rough edges i should be testing?

@kpande

This comment has been minimized.

Show comment
Hide comment
@kpande

kpande Mar 8, 2017

Member

for me the main issue is now sending too many trim requests to the backend when it can not reply in time. it will eventually cause I/O to the vdev to stop.

Member

kpande commented Mar 8, 2017

for me the main issue is now sending too many trim requests to the backend when it can not reply in time. it will eventually cause I/O to the vdev to stop.

@skiselkov

This comment has been minimized.

Show comment
Hide comment
@skiselkov

skiselkov Mar 8, 2017

Contributor

Hey guys, just a heads up that the upstream PR has been significantly updated.

  • trim zios are now processed via vdev_queue.c and limited to at most 10 executing at the same time per vdev.
  • individual trim commands are capped at 262144 sectors (or 128MB) - recommendation from FreeBSD.
  • The rate limiter in manual trimming is now much finer, working now in at most 128MB increments, rather than one-metaslab-at-a-time.
  • The zfs_trim tunable is now examined later, just before issuing out trim commands from vdev_file/vdev_disk. This allows testing the entire pipeline up to this point.
  • I've built in changes suggested by Matt Ahrens regarding range_tree_clear and range_tree_contains.
  • The default minimum extent size to be trimmed reduced to 128k to catch smaller blocks in more fragmented metaslabs.

The most significant departure from what we have in-house at Nexenta is the zio queueing and the manual trim rate limiting. The remaining parts are largely conserved and we've been running them in production for over a year now.

Contributor

skiselkov commented Mar 8, 2017

Hey guys, just a heads up that the upstream PR has been significantly updated.

  • trim zios are now processed via vdev_queue.c and limited to at most 10 executing at the same time per vdev.
  • individual trim commands are capped at 262144 sectors (or 128MB) - recommendation from FreeBSD.
  • The rate limiter in manual trimming is now much finer, working now in at most 128MB increments, rather than one-metaslab-at-a-time.
  • The zfs_trim tunable is now examined later, just before issuing out trim commands from vdev_file/vdev_disk. This allows testing the entire pipeline up to this point.
  • I've built in changes suggested by Matt Ahrens regarding range_tree_clear and range_tree_contains.
  • The default minimum extent size to be trimmed reduced to 128k to catch smaller blocks in more fragmented metaslabs.

The most significant departure from what we have in-house at Nexenta is the zio queueing and the manual trim rate limiting. The remaining parts are largely conserved and we've been running them in production for over a year now.

@skiselkov

This comment has been minimized.

Show comment
Hide comment
@skiselkov

skiselkov Mar 14, 2017

Contributor

@dweeezil I'd really appreciate it if you could find time to drop by the OpenZFS PR for this and give it a look over: openzfs/openzfs#172
Sadly, I'm kinda short on reviewers. We want to share this with everybody, but if nobody steps up to the plate, then there's nothing we can do.

Contributor

skiselkov commented Mar 14, 2017

@dweeezil I'd really appreciate it if you could find time to drop by the OpenZFS PR for this and give it a look over: openzfs/openzfs#172
Sadly, I'm kinda short on reviewers. We want to share this with everybody, but if nobody steps up to the plate, then there's nothing we can do.

@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Mar 14, 2017

Member

@skiselkov Thanks for the poke. I'm definitely planning on going over the OpenZFS PR and also getting this one refreshed to match.

Member

dweeezil commented Mar 14, 2017

@skiselkov Thanks for the poke. I'm definitely planning on going over the OpenZFS PR and also getting this one refreshed to match.

@skiselkov

This comment has been minimized.

Show comment
Hide comment
@skiselkov

skiselkov Mar 14, 2017

Contributor

@dweeezil Thanks, appreciate it a lot!

Contributor

skiselkov commented Mar 14, 2017

@dweeezil Thanks, appreciate it a lot!

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 19, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 20, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.
@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Mar 20, 2017

Member

This PR has gotten way too long to comfortably deal with in Github. I've just done a complete refresh of the TRIM patch stack based on the upstream PR for OpenZFS and rebased to a current ZoL master. Once I've done some testing of the new stack, this PR will be closed and will be replaced with a new one.

In the mean time, the soon-to-be-posted-PR is in dweeezil:ntrim-next-2.

@skiselkov Once I do some testing and post the new PR, I'll finally be able to start reviewing the OpenZFS PR. I tried to keep as many notes as I can as to some of the issues I've had to deal with which might be applicable upstream.

Member

dweeezil commented Mar 20, 2017

This PR has gotten way too long to comfortably deal with in Github. I've just done a complete refresh of the TRIM patch stack based on the upstream PR for OpenZFS and rebased to a current ZoL master. Once I've done some testing of the new stack, this PR will be closed and will be replaced with a new one.

In the mean time, the soon-to-be-posted-PR is in dweeezil:ntrim-next-2.

@skiselkov Once I do some testing and post the new PR, I'll finally be able to start reviewing the OpenZFS PR. I tried to keep as many notes as I can as to some of the issues I've had to deal with which might be applicable upstream.

@skiselkov

This comment has been minimized.

Show comment
Hide comment
@skiselkov

skiselkov Mar 20, 2017

Contributor

@dweeezil Thank you, appreciate it.

Contributor

skiselkov commented Mar 20, 2017

@dweeezil Thank you, appreciate it.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 20, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 22, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 22, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 23, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Mar 24, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

@dweeezil dweeezil referenced this pull request Mar 25, 2017

Open

OpenZFS - 6363 Add UNMAP/TRIM functionality #5925

1 of 11 tasks complete
@dweeezil

This comment has been minimized.

Show comment
Hide comment
@dweeezil

dweeezil Mar 25, 2017

Member

Replaced with #5925.

Member

dweeezil commented Mar 25, 2017

Replaced with #5925.

@dweeezil dweeezil closed this Mar 25, 2017

tuomari added a commit to tuomari/zfs that referenced this pull request Mar 30, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Apr 2, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Apr 2, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Apr 7, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

@behlendorf behlendorf removed this from In Progress in 0.7.0-rc4 Apr 8, 2017

dweeezil added a commit to dweeezil/zfs that referenced this pull request Apr 8, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Apr 14, 2017

Fix vdev_raidz_psize_floor()
The original implementation could overestimate the physical size
for raidz2 and raidz3 and cause too much trimming.  Update with the
implementation provided by @ironMann in #3656.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment