Faster `zfs list -o name -s name` output #5558

Open
ryao opened this Issue Jan 3, 2017 · 4 comments

Projects

None yet

4 participants

@ryao
Member
ryao commented Jan 3, 2017 edited

I have heard complaints from former prospective and current users about zfs list output being slow when outputing tens of thousands of datasets. zfs list is slow because it buffers the full output from the kernel in user space, sorts it and then prints it. This gives us the memory footprint of a breath first search and causes output to be delayed until both reading and sorting have finished. I believe that it is possible to make this several times faster and lower latency between command execution and the first item being printed by orders of magnitude in the typical case where we sort by name.

The technique is a hybrid DFS-BFS. Instead of fetching the first item in a level of the SPA namespace, recursing on it and sorting after everything has been read, we fetch the entire first level like a BFS, sort, print the first item in the sorted order and if it has children, recurse like a DFS before continuing to the next item that we print before recursing if it has children.

Lets say that n is the number of items being printed, each level contains an average of m items and there p levels (such that n = m * k^p). In which case, I can show this algorithm having multiple benefits over the current one:

  1. Lets say that n is the number of items being printed, each level contains an average of m items and there p levels (such that n = m * k^p). The total memory complexity is O(m*p) rather than O(n), which is a p/k^p factor decrease in memory required.
  2. This also allows us to to reduce the number of allocations to O(m*p) because we could recycle defer freeing memory allocations until the operation has finished. That way the number of allocations would be the same as the amount of memory used.
  3. The time before the first element is printed becomes O(m*p) rather than the time to list and sort all items.
  4. The total time complexity of the sorting algorithm becomes O(n/c * log(n)) instead of O(n * log(n)). We do n/m sorting operations that are each O(mlogm) and we can substitute n = m^c to yield O(n/c * log(n)). This means that the last element is printed by this algorithm before the last element is printed by the current algorithm.

This is only possible for the default sort by name because the SPA hierarchy can be exploited to both skip comparisons and place a dataset into its final position before the sorting operation has finished. It only makes a difference when there number of children of any datasets are kept small. Otherwise, it would degrade to the current behavior.

I am filing this issue to solicit feedback ahead of implementing it.

@DeHackEd
Contributor
DeHackEd commented Jan 3, 2017

One reason 'zfs list' buffers is that indentation needs of the columns is potentially unknown until all data has been collected. While the algorithm for crawling the disk may be faster, ZFS does need to buffer everything in order to do its pretty printing (-H notwithstanding)

@ryao
Member
ryao commented Jan 3, 2017 edited

@DeHackEd Good point. This would still work in the case of zfs list -o name -s name. Before ClusterHQ failed, a colleague had approached me about why this was slow and in his case, all he needed was zfs list -o name -s name. I have updated the title to reflect that it applies to that.

@ryao ryao changed the title from Faster `zfs list` output to Faster `zfs list -o name -s name` output Jan 3, 2017
@richardelling

Just a few comment and encouragement. In previous lives, we've done something similar.

There are two parts to the problem:

  1. getting all of the names/props out fast
  2. sorting for UI

From a UI perspective, the "zfs" CLI becomes cumbersome after a few dozen datasets or properties. In the prior lives, we build UIs that run in browsers, so we did the sorting and optional filtering in the presentation layers, thus reducing the work to just getting the names/props out fast. Or, put another way, for tens of thousands of datasets, consider not using "zfs" CLI. For the poor souls who screen-scrape "zfs" CLI, this can certainly help.

@ryao as you work through this, consider if there is a good place in libzfs_core for some of the functionality.

@mailinglists35
mailinglists35 commented Jan 8, 2017 edited

i have applied pr 5560 but for me it made some difference but it still feels slow.

before patch:

me: zfs, show me the datasets!
zfs: i'm thinking, i'm thinking, i'm thinking, i'm thinking, [zfs uses 100% cpu, dbu_evict uses 15% cpu]
zfs: here is the whole list!
total time 29 seconds (first run after import), 19 seconds (next runs)

after patch:

me: zfs, show me the datasets!
zfs: i'm thinking, i'm thinking, [zfs uses 100% cpu, dbu_evict uses 15% cpu]
zfs: here is what i've got so far
zfs: i'm thinking, i'm thinking, [zfs uses 100% cpu, dbu_evict uses 15% cpu]
zfs: here is what i've got so far
zfs: i'm thinking, i'm thinking, [zfs uses 100% cpu, dbu_evict uses 15% cpu]
zfs: here is what i've got so far
total time 19 seconds (first run after import), 12 seconds (next runs)

i have a dataset recursive structure like this, and the times above are run for only one of these which has exactly 870 datasets (time zfs list -H -o name -d2 dir1/dir2/dir3/dir4/dir5/dir6)
dir1/dir2/dir3/dir4/dir5/dir6/{800 datasets}
dir1/dir2/dir3/dir4/dir5/dir7/{800 datasets}
dir1/dir2/dir3/dir4/dir5/dir8/{800 datasets}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment