Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenZFS - 6363 Add UNMAP/TRIM functionality #5925

Closed
wants to merge 29 commits into from

Conversation

@dweeezil
Copy link
Member

commented Mar 25, 2017

Description

Add TRIM support. Replacement for #3656.

Motivation and Context

How Has This Been Tested?

Various stress testing with an assortment of vdev types.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Change has been approved by a ZFS on Linux member.

This PR should integrate all the recent changes in the upstream patch set. The stack also includes the separate fixes which were in #3656. It seems stable so far during some fairly abusive testing on SSDs with various types of vdevs. It does not include the "partial" trim support of the previous PR.

@mention-bot

This comment has been minimized.

Copy link

commented Mar 25, 2017

@dweeezil, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @wca and @grwilson to be potential reviewers.

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2017

It looks like I need to work on the autotrim_001_pos test which looks to be the only thing failing in the automated testing.

@skiselkov

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2017

@dweeezil Got any details on that failure?

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2017

@skiselkov I'm sure it's because the test is more than just a bit of a hack and not anything wrong with the TRIM code. I think there's some way to get the test suite error output from the bots but I'm going to have to dig into what that way is. The test did work for at least one of the bots.

@behlendorf

This comment has been minimized.

Copy link
Member

commented Mar 27, 2017

The full test output is available from the log link.

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2017

It looks like the test threshold needs to be a bit more liberal:

16:56:50.81 NOTE: Size of /tmp/trim1.dev is 11 MB
16:56:50.82 
16:56:50.82 ERROR: test 11 -le 10 exited 1

I'd love to have the test script calculate these numbers in some rational way, but for the time being, it looks like a simple adjustment ought to get it passing on all platforms.

@skiselkov

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2017

@dweeezil @behlendorf I'd appreciate if any of you guys could spare a moment to review openzfs/openzfs#172 . The sooner we get this pushed, the sooner I can get to upstreaming other work dependent on it (notably improved resilvering, which depends on some range_tree.c changes from TRIM).

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2017

@skiselkov Yes, absolutely. I just wanted to make sure there were no major issues with this PR against ZoL first and it does appear to be the case now.

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2017

@skiselkov I just finished a first-pass review which incorporates the issues discovered during the ZoL port as well as a couple of other minor things.

@fcrg

This comment has been minimized.

Copy link

commented Apr 4, 2017

@dweeezil

I have reimaged my system and started to test the workflow i already looked at with PR #3656 (ntrim branch)

System information:

  • gentoo linux 4.7.10 updated to ntrim2-next (as mentioned in #5938)
  • a new pool created with: 4x SanDisk SD8SBAT2 3000 + 4x Samsung SSD 750
  • all SSD disks trimmed with sg_unmap
  pool: zfs-b47141e9-3807-43b9-829d-4e6bad8614ef
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: none requested
  trim: none requested
config:

	NAME                        STATE     READ WRITE CKSUM
	zfs-b47141e9-3807-43b9-829d-4e6bad8614ef  ONLINE       0     0     0
	  raidz2-0                  ONLINE       0     0     0
	    zfs-0x5001b444a4534e08  ONLINE       0     0     0
	    zfs-0x5001b444a4534963  ONLINE       0     0     0
	    zfs-0x5001b444a4534eca  ONLINE       0     0     0
	    zfs-0x5001b444a4534edf  ONLINE       0     0     0
	    zfs-0x5002538d701c9bc3  ONLINE       0     0     0
	    zfs-0x5002538d701ca675  ONLINE       0     0     0
	    zfs-0x5002538d701c5128  ONLINE       0     0     0
	    zfs-0x5002538d701ca602  ONLINE       0     0     0

errors: No known data errors

controller-c37acc6e ~ # df
Filesystem                                     1K-blocks      Used Available Use% Mounted on
zfs-b47141e9-3807-43b9-829d-4e6bad8614ef/data 1349526016   976256 1348549760   1% /opt/fast/exports/zfs_mount/b47141e9-3807-43b9-829d-4e6bad8614ef

workflow:
loop with
- create 500 Files with 1GByte random data
- delete the 500 Files
- show trim stats
- trigger trim on pool
- wait 60 seconds
- show trim stats

results:

  • ntrim2-next is as stable as ntrim

  • the trim doesn't keep the write speed at the start level

    • the first 4 loops are fine

    • the write speed decreases from 1GB/s to 250MB/s starting with loop 5

    • the trim speeds up from
      trim: completed on Tue Apr 4 09:09:58 2017 (after 0h4m)
      to
      trim: completed on Tue Apr 4 10:01:44 2017 (after 0h2m)

    • trimmed bytes of 500x1GB (Trim stats beform trim #2

        bytes                           4    1906937624576
        bytes_skipped                   4    21376000
      
	Run 1
	After 100 files average write time is 924.620000 ms (last 100 files averaged at 925.100000 ms).
	After 400 files average write time is 962.140000 ms (last 100 files averaged at 980.920000 ms).
	1.03 GB/s
	  trim: none requested
	26 1 0x01 5 240 647228697797 1506291123363
	name                            type data
	extents                         4    0
	bytes                           4    0
	extents_skipped                 4    0
	bytes_skipped                   4    0
	auto_slow                       4    0
	  trim: 3.83%	started: Tue Apr  4 09:05:19 2017	(rate: max)
	  trim: 7.41%	started: Tue Apr  4 09:05:19 2017	(rate: max)
	  trim: 62.35%	started: Tue Apr  4 09:05:19 2017	(rate: max)
	26 1 0x01 5 240 647228697797 1576308926728
	name                            type data
	extents                         4    9452
	bytes                           4    1188874386432
	extents_skipped                 4    467
	bytes_skipped                   4    20981248
	auto_slow                       4    0

	Run 2
	After 100 files average write time is 1068.020000 ms (last 100 files averaged at 1068.400000 ms).
	After 400 files average write time is 1057.210000 ms (last 100 files averaged at 1044.890000 ms).
	955.49 MB/s
	  trim: completed on Tue Apr  4 09:09:58 2017 (after 0h4m)
	26 1 0x01 5 240 647228697797 2129754574869
	name                            type data
	extents                         4    14877
	bytes                           4    1906937624576
	extents_skipped                 4    478
	bytes_skipped                   4    21376000
	auto_slow                       4    0
	  trim: 3.52%	started: Tue Apr  4 09:15:43 2017	(rate: max)
	  trim: 7.89%	started: Tue Apr  4 09:15:43 2017	(rate: max)
	  trim: 60.30%	started: Tue Apr  4 09:15:43 2017	(rate: max)
	26 1 0x01 5 240 647228697797 2199772313076
	name                            type data
	extents                         4    24365
	bytes                           4    3057016177664
	extents_skipped                 4    1060
	bytes_skipped                   4    47407104
	auto_slow                       4    0

	Run 3
	After 100 files average write time is 1181.210000 ms (last 100 files averaged at 1181.590000 ms).
	After 400 files average write time is 1196.107500 ms (last 100 files averaged at 1133.580000 ms).
	848.11 MB/s
	  trim: completed on Tue Apr  4 09:19:47 2017 (after 0h4m)
	26 1 0x01 5 240 647228697797 2817946777241
	name                            type data
	extents                         4    29964
	bytes                           4    3759277403136
	extents_skipped                 4    1286
	bytes_skipped                   4    57543168
	auto_slow                       4    0
	  trim: 3.44%	started: Tue Apr  4 09:27:11 2017	(rate: max)
	  trim: 7.54%	started: Tue Apr  4 09:27:11 2017	(rate: max)
	  trim: 63.59%	started: Tue Apr  4 09:27:11 2017	(rate: max)
	26 1 0x01 5 240 647228697797 2887964719605
	name                            type data
	extents                         4    40268
	bytes                           4    4970564539904
	extents_skipped                 4    1903
	bytes_skipped                   4    82348544
	auto_slow                       4    0

	Run 4
	After 100 files average write time is 1189.800000 ms (last 100 files averaged at 1190.220000 ms).
	After 400 files average write time is 1201.217500 ms (last 100 files averaged at 1128.120000 ms).
	824.01 MB/s
	  trim: completed on Tue Apr  4 09:31:37 2017 (after 0h4m)
	26 1 0x01 5 240 647228697797 3523314912234
	name                            type data
	extents                         4    45771
	bytes                           4    5664869357056
	extents_skipped                 4    1992
	bytes_skipped                   4    83807744
	auto_slow                       4    0
	  trim: 5.84%	started: Tue Apr  4 09:38:56 2017	(rate: max)
	  trim: 10.21%	started: Tue Apr  4 09:38:56 2017	(rate: max)
	  trim: 48.33%	started: Tue Apr  4 09:38:56 2017	(rate: max)
	26 1 0x01 5 240 647228697797 3593331970183
	name                            type data
	extents                         4    53659
	bytes                           4    6585181988864
	extents_skipped                 4    2445
	bytes_skipped                   4    101315584
	auto_slow                       4    0

	Run 5
	After 100 files average write time is 1302.250000 ms (last 100 files averaged at 1302.600000 ms).
	After 400 files average write time is 1848.815000 ms (last 100 files averaged at 2993.290000 ms).
	459.89 MB/s
	  trim: completed on Tue Apr  4 09:43:03 2017 (after 0h4m)
	26 1 0x01 5 240 647228697797 4716482767122
	name                            type data
	extents                         4    61216
	bytes                           4    7514500459008
	extents_skipped                 4    2880
	bytes_skipped                   4    117384192
	auto_slow                       4    0
	  trim: 5.53%	started: Tue Apr  4 09:58:49 2017	(rate: max)
	  trim: 12.16%	started: Tue Apr  4 09:58:49 2017	(rate: max)
	  trim: 68.00%	started: Tue Apr  4 09:58:49 2017	(rate: max)
	26 1 0x01 5 240 647228697797 4786500234382
	name                            type data
	extents                         4    71949
	bytes                           4    8811289492992
	extents_skipped                 4    3354
	bytes_skipped                   4    132553216
	auto_slow                       4    0

	Run 6
	After 100 files average write time is 2958.750000 ms (last 100 files averaged at 2959.120000 ms).
	After 200 files average write time is 3632.720000 ms (last 100 files averaged at 4306.720000 ms).
	After 300 files average write time is 3958.830000 ms (last 100 files averaged at 4611.120000 ms).
	After 400 files average write time is 4199.092500 ms (last 100 files averaged at 4919.950000 ms).
	229.1 MB/s
	  trim: completed on Tue Apr  4 10:01:44 2017 (after 0h2m)
	26 1 0x01 5 240 647228697797 7016967381878
	name                            type data
	extents                         4    76610
	bytes                           4    9370406416896
	extents_skipped                 4    3877
	bytes_skipped                   4    154841088
	auto_slow                       4    0
	  trim: 5.28%	started: Tue Apr  4 10:37:10 2017	(rate: max)
	  trim: 9.87%	started: Tue Apr  4 10:37:10 2017	(rate: max)
	  trim: 69.14%	started: Tue Apr  4 10:37:10 2017	(rate: max)
	26 1 0x01 5 240 647228697797 7086984711161
	name                            type data
	extents                         4    87400
	bytes                           4    10688809731072
	extents_skipped                 4    4261
	bytes_skipped                   4    165418496
	auto_slow                       4    0

	Run 7
	After 100 files average write time is 5197.210000 ms (last 100 files averaged at 5197.620000 ms).
	After 200 files average write time is 5146.340000 ms (last 100 files averaged at 5095.510000 ms).
	After 300 files average write time is 4745.976667 ms (last 100 files averaged at 3945.310000 ms).
	After 400 files average write time is 4448.757500 ms (last 100 files averaged at 3557.150000 ms).
	231.29 MB/s
	  trim: completed on Tue Apr  4 10:39:44 2017 (after 0h2m)
	26 1 0x01 5 240 647228697797 9291998211837
	name                            type data
	extents                         4    92375
	bytes                           4    11262932804096
	extents_skipped                 4    4632
	bytes_skipped                   4    178885120
	auto_slow                       4    0
	  trim: 5.94%	started: Tue Apr  4 11:15:05 2017	(rate: max)
	  trim: 7.97%	started: Tue Apr  4 11:15:05 2017	(rate: max)
	  trim: 67.91%	started: Tue Apr  4 11:15:05 2017	(rate: max)
	26 1 0x01 5 240 647228697797 9362015205000
	name                            type data
	extents                         4    103221
	bytes                           4    12557969911808
	extents_skipped                 4    5230
	bytes_skipped                   4    199454208
	auto_slow                       4    0



zpool events -v
Apr  4 2017 09:05:19.652737494 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e345af 0x26e7fbd6 
        eid = 0x9

Apr  4 2017 09:09:58.055856245 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e346c6 0x3544c75 
        eid = 0xa

Apr  4 2017 09:15:43.106840563 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e3481f 0x65e41f3 
        eid = 0xb

Apr  4 2017 09:19:47.227599243 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e34913 0xd90e38b 
        eid = 0xc

Apr  4 2017 09:27:11.290653062 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e34acf 0x11530386 
        eid = 0xd

Apr  4 2017 09:31:37.804327659 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e34bd9 0x2ff110eb 
        eid = 0xe

Apr  4 2017 09:38:56.647230543 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e34d90 0x2693f44f 
        eid = 0xf

Apr  4 2017 09:43:03.993825649 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e34e87 0x3b3c9371 
        eid = 0x10

Apr  4 2017 09:58:49.804760364 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e35239 0x2ff7ab2c 
        eid = 0x11

Apr  4 2017 10:01:44.205130847 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e352e8 0xc3a0c5f 
        eid = 0x12

Apr  4 2017 10:37:10.269224926 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e35b36 0x100c0bde 
        eid = 0x13

Apr  4 2017 10:39:44.751240625 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e35bd0 0x2cc705b1 
        eid = 0x14

Apr  4 2017 11:15:05.282919364 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e36419 0x10dd01c4 
        eid = 0x15

Apr  4 2017 11:17:43.399592897 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0xa9923a64b19aeb6a
        pool_context = 0x0
        time = 0x58e364b7 0x17d14dc1 
        eid = 0x16


@dweeezil dweeezil force-pushed the dweeezil:ntrim2 branch from dad1892 to 360651e Apr 7, 2017

@behlendorf behlendorf added this to In Progress in 0.7.0-rc4 Apr 8, 2017

@behlendorf

This comment has been minimized.

Copy link
Member

commented Apr 8, 2017

Looks like the trim tests didn't run. Are they perhaps missing from the Makefile.am?

Warning: TestGroup 
'/var/lib/buildbot/slaves/zfs/Ubuntu_14_04_x86_64__TEST_/build/tests/zfstests/tests/functional/trim' not added to this run. Auxiliary script
'/var/lib/buildbot/slaves/zfs/Ubuntu_14_04_x86_64__TEST_/build/tests/zfstests/tests/functional/trim/setup' failed verification.

A rebase on master should get you part the Amazon xfstests failure.

I'm not sure what happened with the 32-bit build so I've resubmitted it.

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Apr 8, 2017

@behlendorf AFAICT, the necessary things are in the various Makefile.am files. I was getting that "failed verification" error, too, after re-doing the newer test suite stuff and it turned out that I was missing execute permissions on the files, which does appear to be there in the latest commit. In any case, I'll push a rebased and slightly squashed version and see what happens.

@dweeezil dweeezil force-pushed the dweeezil:ntrim2 branch 2 times, most recently from ca90ef0 to 5f90dce Apr 8, 2017

@fcrg

This comment has been minimized.

Copy link

commented Apr 10, 2017

@dweeezil

rerunning the above test with

  • changing the minimum extent size to zero ( zfs_trim_min_ext_sz = 0; )

  • using the whole 8 disks (removing the device mapper which is per default present in our system)

shows the same result:

Run 1 - 1.02 GB/s
..
Run 5 - 470 MB/s

@fcrg

This comment has been minimized.

Copy link

commented Apr 10, 2017

@dweeezil

run zpool trim on an empty pool increases trimstats

if i run zpool trim on an empty pool (no data has been written or deleted) the trimstats increase with every run

Apr 10 2017 08:51:00.612198185 sysevent.fs.zfs.pool_create
        version = 0x0
        class = "sysevent.fs.zfs.pool_create"
        pool_guid = 0x6af05852217fb73a
        pool_context = 0x6
        time = 0x58eb2b54 0x247d6729 
        eid = 0x14

Apr 10 2017 08:51:26.428952092 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0x6af05852217fb73a
        pool_context = 0x0
        time = 0x58eb2b6e 0x19914a1c 
        eid = 0x15

Apr 10 2017 08:52:28.842354770 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0x6af05852217fb73a
        pool_context = 0x0
        time = 0x58eb2bac 0x32355052 
        eid = 0x16

Apr 10 2017 08:52:43.026218614 sysevent.fs.zfs.trim_start
        version = 0x0
        class = "sysevent.fs.zfs.trim_start"
        pool_guid = 0x6af05852217fb73a
        pool_context = 0x0
        time = 0x58eb2bbb 0x1901076 
        eid = 0x17

Apr 10 2017 08:53:45.435618021 sysevent.fs.zfs.trim_finish
        version = 0x0
        class = "sysevent.fs.zfs.trim_finish"
        pool_guid = 0x6af05852217fb73a
        pool_context = 0x0
        time = 0x58eb2bf9 0x19f700e5 
        eid = 0x18



controller-c37acc6e home # cat /proc/spl/kstat/zfs/zfs-f66bb015-bb98-4e92-be9/trimstats 
60 1 0x01 5 240 2187888022415 2207324117009
name                            type data
extents                         4    0
bytes                           4    0
extents_skipped                 4    0
bytes_skipped                   4    0
auto_slow                       4    0
controller-c37acc6e home # zpool trim zfs-f66bb015-bb98-4e92-be9c-ae399dc6b0bd
controller-c37acc6e home # cat /proc/spl/kstat/zfs/zfs-f66bb015-bb98-4e92-be9/trimstats 
60 1 0x01 5 240 2187888022415 2286232552007
name                            type data
extents                         4    14868
bytes                           4    1992864441344
extents_skipped                 4    0
bytes_skipped                   4    0
auto_slow                       4    0
controller-c37acc6e home # zpool trim zfs-f66bb015-bb98-4e92-be9c-ae399dc6b0bd
controller-c37acc6e home # cat /proc/spl/kstat/zfs/zfs-f66bb015-bb98-4e92-be9/trimstats 
60 1 0x01 5 240 2187888022415 2359124416553
name                            type data
extents                         4    29738
bytes                           4    3985728876544
extents_skipped                 4    0
bytes_skipped                   4    0
auto_slow                       4    0

@dweeezil dweeezil force-pushed the dweeezil:ntrim2 branch from 5f90dce to 66838d5 Apr 10, 2017

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

@fcrg Manual trim always trims everything. I did have a "only trim never-allocated space" option in earlier versions of the PR but have left it out, at least for the time being, for better alignment with the upstream PR.

@kpande

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2017

@dweeezil on demand trim is not usable without that optimization for my use case

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

@kpande I've ported (untested) the partial trim patch to the current stack in dweeezil:ntrim2-with-partial-manual-trim. There are some subtle issues with this approach which I'll communicate to you off-list. This patch should behave just as it did before, otherwise.

@fcrg

This comment has been minimized.

Copy link

commented Apr 10, 2017

@dweezil Manual trim always trims everything means that trim doesn't work on the list of recently deleted items but trims always the not allocated space of the pool?

@skiselkov

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2017

@fcrg Yes, that's exactly what it does. The idea behind this is to allow one to fully recover a device to a clean state, regardless of whether it has been used before or not.

@kpande

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2017

@skiselkov the previous ZoL PR had a -s switch for zpool trim to skip unallocated portions.

@skiselkov

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2017

@kpande Yeah, I saw that discussion. TBH, I see no problem in integrating this kind of feature upstream.

skiselkov and others added some commits Apr 19, 2017

Matt Ahrens' review comments.
Porting Notes:
Man page changes dropped for the moment.  This can be reconsiled
when the final version is merged to OpenZFS.  They are accurate
now, only worded a little differently.
Async TRIM, Extended Stats
The blkdev_issue_discard() function has been available for
a long time by the kernel but it only supports synchronous
discards.  The __blkdev_issue_discard() function provides
an asynchronous interface but was added in the 4.6 kernel.

Only supporting synchronously discards can potentially limit
performance when processing a large number of small extents.
To avoid this an asynchronous discard implementation has been
added to vdev_disk.c which builds on existing functionality.

The kernel provided synchronous version remains the default
pending additional functional and performance testing.

Due to different mechamism used for submitting TRIM commands
there were not being properly accounted for in the extended
statistics.  Resolve this by allow for aggregated stats to
be returned as part of the TRIM zio.  This allows for far
better visibility in to the discard request sizes.

Minor documentation updates.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Matt Ahrens' review comments, round 3.
1) Removed the first-fit allocator.
2) Moved the autotrim metaslab scheduling logic into vdev_auto_trim.
2a) As a consequence of #2, metaslab_trimset_t was rendered superfluous. New
   trimsets are simple range_tree_t's.
3) Made ms_trimming_ts remove extents it is working on from ms_tree and then
   add them back in.
3a) As a consequence of #3, undone all the direct changes to the allocators and
   removed metaslab_check_trim_conflict and range_tree_find_gap.

Porting Notes:
* Removed WITH_*_ALLOCATOR macros and aligned remaining allocations
  with OpenZFS.  Unused wariables warnings resolved with the gcc
  __attribute__ ((unused__ keyword.
* Added missing calls for ms_condensing_cv.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fix abd_alloc_sametype() panic
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Matt Ahren's review comments round 4:
1) Simplified the SM_FREE spacemap writing while a trim is active.
2) Simplified the range_tree_verify in metaslab_check_free.
3) Clarified comment above metaslab_trim_all.
4) Substituted 'flust out' with 'drop' in comment in metaslab_trim_all.
5) Moved ms_prev_ts clearing up to ms_cur_ts claring in metaslab_trim_all.
6) Added recomputation of metaslab weight when metaslab is loaded.
7) Moved dmu_tx_commit inside of spa_trim_update_time.
8) Made the smallest allowable manual trim rate 1/1000th of a metaslab size.
9) Switched to using hrtime_t in manual trim timing logic.
10) Changed "limited" to "preserve_spilled" in vdev_auto_trim.
11) Moved vdev_notrim setting into zio_vdev_io_assess.a

Porting Notes:
  * vdev_disk.c and zio.c hunks already applied.
  * nsec_per_tick -> MSEC2NSEC(1)
Tim Chase's review comments, round 2.
Porting Notes:
* metaslab_sync changes already applied.
* resync of test cases needed
Want manual trim feature to skip never-allocated space
Some storage backends such as large thinly-provisioned SANs are very slow
for large trims.  Manual trim now supports "zpool trim -p" (partial trim)
to skip metaslabs for which there is no spacemap.

Signed-off-by: Tim Chase <tim@chase2k.com>
Update and add additional TRIM test cases
The existing test cases were split in to multiple test cases and
refactored.  There are now test cases for the following:

zpool_trim_001_pos - Verify manual TRIM
zpool_trim_002_pos - Verify manual trim can be interrupted
zpool_trim_003_pos - Verify 'zpool trim -s' rate limiting
zpool_trim_004_pos - Verify 'zpool trim -p' partial TRIM works
zpool_trim_005_neg - Verify bad parameters to 'zpool trim'
zpool_trim_006_neg - Verify bad parameters to 'zpool trim -r'
autotrim_001_pos   - Verify 'autotrim=on' pool data integrity
autotrim_002_pos   - Verify various pool geometries
manualtrim_001_pos - Verify manual trim pool data integrity
manualtrim_002_pos - Verify various pool geometries
manualtrim_003_pos - Verify 'zpool import|export'
manualtrim_004_pos - Verify 'zpool online|offline|replace'
manualtrim_005_pos - Verify TRIM and scrub run concurrently

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Review feedback
* Rename TRIM taskq threads to be more concise for Linux.
* Fix divide by zero panic

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Remove vdev_raidz_map_alloc()
Rather than hacking `vdev_raidz_map_alloc()` to get the child
offsets calculate the values directly.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Review feedback 2
* Fixed missing taskq_destroy when exporting a pool which is
  being actively trimmed.
* Add auto/manual TRIM coverage to ztest.
* Temporarily disable manualtrim_004_pos.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add trim manpage
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Fix wrong logical operator
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Wait for 1 sec before check trim status
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Clean-ups following rebase to master
Signed-off-by: Tim Chase <tim@chase2k.com>
Add tags to trim test cases
NOTE: should be squashed into a previous commit
Preserve activation flags when sorting metaslabs
Also, comment-out the ASSERT(!metaslab_should_allocate(msp, asize));
in metaslab_group_alloc_normal().  It seems that the additional
metaslab_group_sort() performed by trim makes this assertion invalid.

@dweeezil dweeezil force-pushed the dweeezil:ntrim2 branch from 00c6144 to 586bdd9 Jan 2, 2019

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Jan 2, 2019

Refreshed, finally, against a recent master. It does seem to work again, however, I've only given it very light manual testing and it does currently fail at least on of the ZTS tests (not looked into those yet). I figured it would be a good idea to get it through the bots anyway and to show some current activity on this.

@dweeezil

This comment has been minimized.

Copy link
Member Author

commented Jan 8, 2019

Replaced with #8255.

@dweeezil dweeezil closed this Jan 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.