Skip to content

Disable direct reclaim on zvols #669

Closed
wants to merge 1 commit into from

3 participants

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

Previously, it was possible for the direct reclaim path to be invoked
when a write to a zvol was made. When a zvol is used as a swap device,
this often causes swap requests to depend on additional swap requests,
which deadlocks. We address this by disabling the direct reclaim path
on zvols.

This closes issue #342.

@ryao ryao referenced this pull request Apr 16, 2012
Closed

Support swap on zvol #342

@pyavdr
pyavdr commented Apr 16, 2012

As for my Opensuse 12.1, this patch don´t work. Increased the min_free_kbytes too and patched zvol.c, but with memtester the system freezes as soon it hits the swap space. I tried it twice: once it freezes at 270 MB,next time at 670 MB.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

@pyavdr Does your kernel have CONFIG_PREEMPT_VOLUNTARY=y set? Preemptible kernels will still deadlock, but the solution for that should come with the resolution of issue #83.

@pyavdr
pyavdr commented Apr 16, 2012

This is a default Kernel, CONFIG_PREEMPT_NONE=y (is set) CONFIG_PREEMPT_VOLUNTARY= is not set and CONFIG_PREEMT is not set. So i need to recompile the kernel to test yr patch.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

CONFIG_PREEMPT_NONE=y is what should be set. How much RAM do you have? What is your pool configuration? How quickly does it freeze without this patch? If you are booting off ZFS, did you remember to rebuild your initramfs?

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

@pyavdr I just tried running memtester on my system and it works. Note that I also have patches from pull requests #618, #651 and #660 installed, so they might be the reason my system is stable while yours is not. In particular, I suspect that pull request #618 is responsible. Pull request #660 might also play a role.

@pyavdr
pyavdr commented Apr 16, 2012

Ok, i use 12 GB in a vm session, can go up to 25 GB when needed. ( pool is latest version, 4 drives, no raidz/mirror, 10GB zfsswap volume) I would like to try that patches too, but can´t figure out how to apply all these various patches in different states (I edited yr patch manually). So i need some time to apply them. The "how to" to apply all these patches is widespreaded allover. If you find the time, maybe you can put them together in a single one, it would be easier to figure out how to apply them. But besides that, if you really found it, congratulations!

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

@pyavdr You can get the code with all of these patches from the gentoo branch of my fork:

https://github.com/gentoofan/zfs/tree/gentoo

I intend to snapshot that branch when I feel that ZFS is ready to enter Gentoo's testing tree.

By the way, pull request #660 resolves a stability issue that predominantly affects systems with more than 8GB of RAM. It seems like it might address your issue. Alternatively, you could reduce the amount of RAM that you give to your virtual machine.

@pyavdr
pyavdr commented Apr 16, 2012

Ok, i applied #618, #660 and #669, set min_free_kbytes to 131072, using 12 GB RAM, started memtester ... system freezes not completly, but almost. There is some write traffic to the zfsswap, but cant move cursor or type on cmdline.
Did not work.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

How much RAM are you asking memtester to use? It will do mlock, which will prevent the kernel from having access to the RAM. It is possible that is what is killing your system. On my system with 8GB of RAM, I only let memtester take half of it. There was a significant lag when it did this because of mlock, although it went away after the kernel finished reorganizing system memory.

Also note that 131072 is the correct value for my system, but the correct value for yours could be higher given that you have more RAM.

@pyavdr
pyavdr commented Apr 16, 2012

Ok, applied #651, set vm.min_free_kbytes=512000. Started memtester with 2000/4000/6000/8000/10000 no problem, runs through, no swap needed. Started memtester with 12000 : mhhhhh : freeze. some traffic on the zfsswap devices. After some minutes: My windows 7 host gives a BSOD: memory management. Need to reboot the whole system , brrrrrr.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

I cannot say that this is surprising. Memtester does allocations that are incapable of being swapped. When you run memtester, memory is effectively removed from your system. Giving it nearly all of your RAM like you did would likely kill Linux regardless of the filesystem used.

You need to find something else to do allocations that can be swapped. A few instances of python -c "print 2**10**10" will likely suffice. I had originally assumed that you were using memtester alongside something like this because of memtester's ability to reduce your system's effective memory.

@pyavdr
pyavdr commented Apr 16, 2012

Ok, starting to start instances of python -c "print 21010" & ... one ... 20 sec waiting... two .... three and so on ... after 7 instances the system hits the swap space which leads to freeze.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

What is the default setting for vm.min_free_kbytes? Did you try increasing it?

@pyavdr
pyavdr commented Apr 16, 2012

I started some hours ago with the default of something 68000 ... increased it to 131072 ... and finally to 512000.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

Are you certain that you properly patched the kernel modules and that you are running tests with the updated setting? If you are using code from my GIT, did you do git checkout gentoo before building anything?

@pyavdr
pyavdr commented Apr 16, 2012

Yes, im pretty sure. What i did to merge the patches: edit the specific files ( .../module/zfs/arc.c ...) with the changes in the patches. These are only few lines, no problem. After that: make; make install. Make finds the changes, compiles and links. "Make install" copies the news to destination, then reboot. In the last weeks i changed several source files in zpool.c or zfs.c, so that procedure works as expected. Just to be sure : For now i tried it from start: make clean; ./configure; make; make install; reboot; after starting the pythons: it freezes again. Don´t know these "git" interface, need to check this, there is already a book lying here on my desk :-).

@behlendorf
ZFS on Linux member

Thanks for digging in to this, I suspected something like this was going on. There have been a number of similar deadlocks to this which were resolved by disabling direct reclaim where appropriate. In general, I've tried to do this in as targeted a manor as possible.

As I'm sure you saw I have been forced to resort to setting PF_MEMALOC on occasion. Considerable care needs to be taken when doing this to ensure no other bits gets mistakenly cleared or set in the current->flags. For this reason, it's usually better to target the specific memory allocations which are causing the issue. Direct reclaim can be disabled for those kmem_alloc()'s by using KM_PUSHPAGE instead of KM_SLEEP.

In this case I'm not 100% sure it will be possible to pass KM_PUSHPAGE to all the offending allocations under zvol_write() but it's worth investigating since it may be cleaner. Getting a full stack of the exact deadlock would be helpful.

However, if we stick with your proposed patch you're going to need to move setting PF_MEMALLOC to the very top of the function, or even better in to a wrapper function. Both the zil_commit() and zfs_range_lock() have the potential to enter direct relcaim and they are outside the flag.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

@behlendorf I am not certain if a wrapper function would be appropriate given that there is only a single call to zvol_write(), but it might be useful to write a helper macro that takes a variable, its type, some flags to set temporarily in that variable, a function pointer and the arguments to that function. It could be used to wrap the zvol_write() call, but I am not sure where the appropriate place would be to put this macro.

I have modified the patch to address the issues that you highlighted. I have also updated the pull request message at the top. In theory, we could use thread local storage to control the value used in all allocations that are currently use KM_SLEEP so that we can flip them to KM_PUSHPAGE on demand. That would work well with the wrapper macro idea.

In addition, I suspect that the reason that I need to set vm.min_free_kbytes is because indirect reclaim can fail. I believe that results in the additional deadlock that I have observed with this patch where the system is not immediately crippled. I think that we can solve that by modifying the SPL to maintain a pool of pages as per /proc/sys/kernel/spl/vm/swapfs_reserve and to provide a thread local storage flag that ZFS can set to permit indirect reclaim to draw from those pages. That probably should be a separate patch.

@ryao
ZFS on Linux member
ryao commented Apr 16, 2012

On second thought, maintaining a pool of pages in the SPL that are released on demand would suffer from a race condition where another thread could steal the pages meant for ZFS on SMP systems. This could also happen on uniprocessor systems where preemption is possible if we are not careful. Addressing that could require implementing a memory allocator for ZFS, which should be able to guarantee that pages reserved for ZFS would only be used by ZFS.

@ryao
ZFS on Linux member
ryao commented Apr 17, 2012

I have done some additional testing. This patch permits a single disk pool to use swap on a zvol if vm.min_free_kbytes is sufficiently high, but it does not appear to have the same effect on a pool with a single 6-disk raidz vdev on my server, which has 16GB of RAM. Increasing the value of vm.min_free_kbytes to 1048576 permits some amount of writing to swap, but then all writes will stop as what appears to be a soft deadlock occurs. Running the reboot command as root before the soft deadlock becomes a hard deadlock appears to return the system to a sane state during the shutdown process.

@behlendorf
ZFS on Linux member

@gentoofan Your updated patch doesn't do what you think it does. The zvol_dispatch() function will dispatch the zvol_write() function to be executed in the context of one of the zvol taskq threads. So your setting the PF_MEMALLOC flag in the dispatching thread, but that will have no effect since zvol_write() will be done by one of the taskq worker thread. If we're going to take the PF_MEMALLOC approach this bit must be set in zvol_write(). I'd suggest a wrapped function like this.

__zvol_write()
{
        /* Existing zvol_write() implementation */
}

zvol_write()
        if (current->flags & PF_MEMALLOC) {
                error = __zvol_write()
        } else {
                current->flags |= PFMEMALLOC;
                error = __zvol_write()
                current->flags &= ~PFMEMALLOC;
        }

        return (error);
}

This should also resolve your indirect reclaim case for kswapd which will already have set PF_MEMALLOC. See commit 6a95d0b for a better explanation of this race, we fixed a similar subtle issue in the mmap code this same way.

Longer term I think the best way to address this is still use KM_PUSHPAGE in all the offending allocations. This is what all the other filesystems in the Linux kernel do, they must be very careful about not allocating any memory in the write path. If they absolutely have too them this flag can be used... check really maps to GFP_NOFS.

@behlendorf
ZFS on Linux member

Related to this, I still really want to see a stack to ensure we're addressing the real deadlock here. Do you happen to have a trivial reproducer for a VM, I'm setup to get a stack.

@ryao
ZFS on Linux member
ryao commented Apr 17, 2012

@behlendorf Thanks for catching that. I will revise the patch shortly.

As for reproducing this, give the VM 2G of RAM and do this:

  1. zfs create -o primarycache=metadata -V 2G rpool/swap
  2. mkswap -f /dev/zvol/rpool/swap
  3. swapon /dev/zvol/rpool/swap
  4. python -c "print 2**10**10"
@ryao
ZFS on Linux member
ryao commented Apr 17, 2012

@behlendorf I have pushed a revised version of my patch, but I think that this still needs more work.

swap appears to work properly on both my desktop and my server and there is no longer any need to edit vm.min_free_kbytes. Unfortunately, I was able to observe a hard deadlock on my desktop when running an instance of python -c "print 2**10**10" for each of its 4 logical cores simultaneously.

I believe that a deadlock can occur where other threads consume pages as indirect reclaim frees them, starving the kernel thread that needs the pages to be able to swap. I think that can be fixed by implementing a TLS flag in the SPL that would enable us to flip allocations to KM_NOSLEEP. This will guarantee that allocations will fallback on emergency memory pools, which should prevent the deadlock I observed under load.

@ryao
ZFS on Linux member
ryao commented Apr 18, 2012

It looks like using PF_MEMALLOC is inappropriate:

http://lkml.indiana.edu/hypermail/linux/kernel/0911.2/00576.html

The main issue is that PF_MEMALLOC permits ZFS to take pages out of ZONE_DMA. That could cause a crash by exhausting pages available for DMA. We should be able to address this issue by flipping allocations to use KM_NOSLEEP instead of setting PF_MEMALLOC.

@behlendorf
ZFS on Linux member

Exactly right, PF_MEMALLOC is a bit of a last resort and we should avoid using it if at all possible. The two places in the existing code where I was forced to use it are because I was unable to modify the exact point of allocation because it was in the kernel proper. Seeing PF_MEMALLOC allowed me to work around the issue without forcing people to patch their kernels.

Anyway, back to this particular issue. I agree the best solution is to pass the proper flags at all the offending allocation points. KM_PUSHPAGE should be enough for this, I don't think we need to resort to KM_NOSLEEP which comes with it's own issues. We just need to avoid a deadlock due to reentering reclaim while we're writing out pages. I'll try and get some stacks tomorrow which I expect will make the issue a bit more concrete.

@ryao
ZFS on Linux member
ryao commented Apr 18, 2012

@behlendorf If you have a kernel patch to eliminate the need for PF_MEMALLOC in ZFS code that is ready for upstream, I could try talking to Greg Kroah-Hartman about sending it to Linus Torvalds for inclusion.

Greg is a Gentoo developer and he might be willing to assist my ZFS efforts in Gentoo.

@behlendorf
ZFS on Linux member

@gentoofan Yes and no. Ricardo and I started a nice thread on linux-mm with Andrew Morton and got everyone to agree that in fact this is a real bug which should be fixed. However, the right fix (in the thread) in pretty invasive and ends up touching all the various arch's which makes it a testing nightmare. Anyway, since I was able to work around it (which I need to do for older kernels anyway) I stopped pushing to issue. Also since it relates to vmalloc() which is something we need to stop using heavily in the long run I didn't feel it was worth the fight. Still, I encourage you to read the thread.

http://marc.info/?l=linux-mm&m=128942194520631&w=4

@ryao
ZFS on Linux member
ryao commented Apr 18, 2012

@behlendorf I tried modifying the code to use KM_PUSHPAGE, but the system will not write to swap:

https://github.com/gentoofan/spl/commits/gentoo
https://github.com/gentoofan/zfs/commits/spl-swap

I am still examining this, but any thoughts that you might have would be appreciated.

@ryao
ZFS on Linux member
ryao commented Apr 18, 2012

Increasing vm.min_free_kbytes to 524288 enables my new patchset to swap. Without that, the system refuses to write to swap, but it does not hard deadlock immediately. Lower values might also work, although I have not tested them yet.

Also, the deadlock involving 4 simultaneous python processes does not appear to occur with my new patchset.

@behlendorf
ZFS on Linux member

To be honest, I'm not a big fan of the tsd approach. Having the lower layers modify the passed flags is asking for problems in my view. Plus the tsd code is already rarely used in zfs and I've been tempted a few times to remove it. I'd prefer to either:

A) Just set PF_MEMALLOC

B) Explicitly pass KM_PUSHPAGE for all impacted allocations. This might be a little broad but it hardly any worse than disabling reclaim for all zvol writes.

I will try to spend some time on this myself over the next week or two.

@ryao
ZFS on Linux member
ryao commented Apr 19, 2012

The following appears to work:

  1. dd if=/dev/zero of=/swap bs=4096 count=2097152
  2. losetup -f /swap
  3. mkswap /dev/loop0
  4. swapon /dev/loop0
  5. python -c "print 2**10**10"

I guess the question should be what the loopback device does that zvols fail to do.

@ryao
ZFS on Linux member
ryao commented Apr 19, 2012

@behlendorf It looks like the loopback device works because of the following lines in taskq_thread() in the SPL, which set PF_MEMALLOC:

   /* Disable the direct memory reclaim path */
   if (tq->tq_flags & TASKQ_NORECLAIM)
           current->flags |= PF_MEMALLOC;
@ryao
ZFS on Linux member
ryao commented Apr 19, 2012

I noticed the the loopback device sets nice to -20. I patched the SPL to use that as well and ran my stress tests. My desktop no longer deadlocks when running 4 simultaneous python processes, so I have opened a pull request with zfsonlinux/spl:

zfsonlinux/spl#99

Addressing the issue with PF_MEMALLOC taking pages from ZONE_DMA is important, but that probably would be best addressed as part of a more comprehensive fix.

@ryao
ZFS on Linux member
ryao commented Apr 19, 2012

I have revised my patch to set the flag that is being used in the codepath taken by the loopback device. I have also revised the commit message to reflect that. I am now at the point where I feel that this should close issue #342.

@ryao ryao Disable direct reclaim on zvols
Previously, it was possible for the direct reclaim path to be invoked
when a write to a zvol was made. When a zvol is used as a swap device,
this often causes swap requests to depend on additional swap requests,
which deadlocks. We address this by disabling the direct reclaim path
on zvols.

This closes issue #342.
544a9a4
@behlendorf
ZFS on Linux member

Right, this has basically the same effect as the previous PF_MEMALLOC patchs however it more broadly applies to all I/O issues for the zvol. The previously patches strictly limited this to the write path.

ZFS on Linux member
ryao replied Apr 19, 2012

This might be necessary. Under heavy memory pressure, it is possible for a page fault to occur, triggering a read that will enter direct reclaim. It is not clear to me the page fault does not hold locks that are needed for direct reclaim. If the page fault does hold such locks, then the virtual memory subsystem could deadlock.

This would explain the hard deadlock that occurred with the previous patch when I ran 4 simultaneous python processes to eat memory by performing a large calculation. I am auditing the kernel virtual memory code to check for this possibility, but it is possible that the combination of my lack of familiarity with the code and its sheer size will cause me to miss a case.

ZFS on Linux member
ryao replied Apr 20, 2012

I was wrong to only consider reads. This kind of deadlock can also occur between discard and write. It also does not need to involve locks in kernel code. If the direct reclaim path is invoked while a key lock is held in ZFS's lower layers, such as the DMU, then the thread holding the lock would deadlock and all subsequent operations would fail.

I was able to easily deadlock my system with a python process for each core when using the older code, but I am no longer able to produce that deadlock with the newer code. I will keep looking at this, but the correction of a single possible deadlock will likely not be enough to invalidate the necessity of this.

ZFS on Linux member

@ryao I agree, this might be the cleanest fix. And if it holds up to your testing, and we're unable to identify the exact offending lock I'm OK with merging it.

I just wish we could clearly identify the offending lock which is resulting in the deadlock. Then we'd be able to more clearly target the fix. If I were to make a guess I'd wager on the zfs_range_lock(). This would explain when you had such good success with your initial patch which didn't cover the zil_commit() call in zvol_write(). Also the write lock is only taken for the write and discard cases which agrees with your previous observations.

If the zfs_range_lock() is to blame it appears as if we can move some of the offending allocations outside the lock. The remaining ones perhaps can be safely flagged with KM_PUSHPAGE to avoid the deadlock. Commenting out the zfs_range_lock(), zfs_range_unlock() would be enough to test the theory. You would lose atomicity to the zvol but that's probably OK for the sake of a test.

@behlendorf
ZFS on Linux member

@ryao So what's the latest on this change? I lost track of the latest testing. Is setting TASKQ_NORECLAIM enough to resolve most issues? If so I'm not adverse to merging it since it clear does help, although I suspect this will need more work.

@ryao
ZFS on Linux member
ryao commented Apr 30, 2012

@behlendorf Setting TASKQ_NORECLAIM eliminated all issues that I have encountered with swap on zvols. The only known issue is the theoretical issue of DMA pages being consumed by ARC.

The dma-kmalloc* entries in my desktop's /proc/slabinfo only show a single slab consuming DMA pages after several days of uptime and several instances of heavy swap usage. This suggests to me that crashes caused by DMA page consumption would be incredibly rare in practice:

slabinfo - version: 2.1
# name : tunables : slabdata
delayed_node 0 0 328 24 2 : tunables 0 0 0 : slabdata 0 0 0
extent_map 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
extent_buffers 0 0 224 18 1 : tunables 0 0 0 : slabdata 0 0 0
extent_state 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_free_space_cache 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_path_cache 0 0 144 28 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_transaction_cache 0 0 416 19 2 : tunables 0 0 0 : slabdata 0 0 0
btrfs_trans_handle_cache 0 0 72 56 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_inode_cache 0 0 1328 24 8 : tunables 0 0 0 : slabdata 0 0 0
fat_inode_cache 0 0 824 19 4 : tunables 0 0 0 : slabdata 0 0 0
fat_cache 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0
ip6_dst_cache 100 100 320 25 2 : tunables 0 0 0 : slabdata 4 4 0
UDPLITEv6 0 0 1216 26 8 : tunables 0 0 0 : slabdata 0 0 0
UDPv6 104 104 1216 26 8 : tunables 0 0 0 : slabdata 4 4 0
tw_sock_TCPv6 0 0 320 25 2 : tunables 0 0 0 : slabdata 0 0 0
TCPv6 60 60 2112 15 8 : tunables 0 0 0 : slabdata 4 4 0
nv_stack_t 85 90 12288 2 8 : tunables 0 0 0 : slabdata 45 45 0
fuse_request 0 0 624 26 4 : tunables 0 0 0 : slabdata 0 0 0
fuse_inode 0 0 832 19 4 : tunables 0 0 0 : slabdata 0 0 0
xfs_inode 0 0 1088 30 8 : tunables 0 0 0 : slabdata 0 0 0
xfs_efd_item 0 0 400 20 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_buf_item 0 0 224 18 1 : tunables 0 0 0 : slabdata 0 0 0
xfs_trans 0 0 280 29 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_da_state 0 0 488 16 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_log_ticket 0 0 216 18 1 : tunables 0 0 0 : slabdata 0 0 0
nfs_direct_cache 0 0 176 23 1 : tunables 0 0 0 : slabdata 0 0 0
nfs_write_data 38 38 832 19 4 : tunables 0 0 0 : slabdata 2 2 0
nfs_read_data 0 0 768 21 4 : tunables 0 0 0 : slabdata 0 0 0
nfs_inode_cache 0 0 1152 28 8 : tunables 0 0 0 : slabdata 0 0 0
rpc_inode_cache 0 0 960 17 4 : tunables 0 0 0 : slabdata 0 0 0
reiser_inode_cache 0 0 880 18 4 : tunables 0 0 0 : slabdata 0 0 0
ext4_inode_cache 0 0 1096 29 8 : tunables 0 0 0 : slabdata 0 0 0
ext4_xattr 0 0 88 46 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_free_data 0 0 56 73 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_allocation_context 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_prealloc_space 0 0 120 34 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_io_end 0 0 1128 29 8 : tunables 0 0 0 : slabdata 0 0 0
jbd2_journal_handle 0 0 24 170 1 : tunables 0 0 0 : slabdata 0 0 0
jbd2_journal_head 0 0 112 36 1 : tunables 0 0 0 : slabdata 0 0 0
jbd2_revoke_table 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
jbd2_revoke_record 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
cfq_io_cq 429 429 104 39 1 : tunables 0 0 0 : slabdata 11 11 0
cfq_queue 374 374 232 17 1 : tunables 0 0 0 : slabdata 22 22 0
bsg_cmd 0 0 312 26 2 : tunables 0 0 0 : slabdata 0 0 0
mqueue_inode_cache 16 16 1024 16 4 : tunables 0 0 0 : slabdata 1 1 0
hugetlbfs_inode_cache 24 24 680 24 4 : tunables 0 0 0 : slabdata 1 1 0
kioctx 0 0 448 18 2 : tunables 0 0 0 : slabdata 0 0 0
dnotify_mark 338 338 152 26 1 : tunables 0 0 0 : slabdata 13 13 0
dio 50 50 640 25 4 : tunables 0 0 0 : slabdata 2 2 0
pid_namespace 60 60 2120 15 8 : tunables 0 0 0 : slabdata 4 4 0
UDP-Lite 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0
ip_fib_trie 292 292 56 73 1 : tunables 0 0 0 : slabdata 4 4 0
UDP 64 64 1024 16 4 : tunables 0 0 0 : slabdata 4 4 0
tw_sock_TCP 224 304 256 16 1 : tunables 0 0 0 : slabdata 19 19 0
TCP 370 425 1920 17 8 : tunables 0 0 0 : slabdata 25 25 0
blkdev_integrity 36 36 112 36 1 : tunables 0 0 0 : slabdata 1 1 0
blkdev_queue 60 60 2088 15 8 : tunables 0 0 0 : slabdata 4 4 0
blkdev_requests 397 505 352 23 2 : tunables 0 0 0 : slabdata 23 23 0
blkdev_ioc 306 306 120 34 1 : tunables 0 0 0 : slabdata 9 9 0
fsnotify_event_holder 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
bip-256 7 7 4224 7 8 : tunables 0 0 0 : slabdata 1 1 0
bip-128 0 0 2176 15 8 : tunables 0 0 0 : slabdata 0 0 0
bip-64 0 0 1152 28 8 : tunables 0 0 0 : slabdata 0 0 0
bip-16 0 0 384 21 2 : tunables 0 0 0 : slabdata 0 0 0
sock_inode_cache 1428 1428 768 21 4 : tunables 0 0 0 : slabdata 68 68 0
file_lock_cache 285 285 208 19 1 : tunables 0 0 0 : slabdata 15 15 0
net_namespace 48 48 2560 12 8 : tunables 0 0 0 : slabdata 4 4 0
shmem_inode_cache 1543 1722 760 21 4 : tunables 0 0 0 : slabdata 82 82 0
Acpi-ParseExt 2856 2856 72 56 1 : tunables 0 0 0 : slabdata 51 51 0
Acpi-State 102 102 80 51 1 : tunables 0 0 0 : slabdata 2 2 0
Acpi-Namespace 1428 1428 40 102 1 : tunables 0 0 0 : slabdata 14 14 0
task_delay_info 1998 2100 136 30 1 : tunables 0 0 0 : slabdata 70 70 0
taskstats 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0
proc_inode_cache 6193 6358 744 22 4 : tunables 0 0 0 : slabdata 289 289 0
sigqueue 1950 1950 160 25 1 : tunables 0 0 0 : slabdata 78 78 0
bdev_cache 64 64 1024 16 4 : tunables 0 0 0 : slabdata 4 4 0
sysfs_dir_cache 21924 21924 144 28 1 : tunables 0 0 0 : slabdata 783 783 0
inode_cache 1498 1848 680 24 4 : tunables 0 0 0 : slabdata 77 77 0
dentry 199402 210276 216 18 1 : tunables 0 0 0 : slabdata 11682 11682 0
buffer_head 741 741 104 39 1 : tunables 0 0 0 : slabdata 19 19 0
vm_area_struct 45848 46536 168 24 1 : tunables 0 0 0 : slabdata 1939 1939 0
mm_struct 563 578 960 17 4 : tunables 0 0 0 : slabdata 34 34 0
files_cache 632 828 704 23 4 : tunables 0 0 0 : slabdata 36 36 0
signal_cache 481 728 1216 26 8 : tunables 0 0 0 : slabdata 28 28 0
sighand_cache 411 525 2176 15 8 : tunables 0 0 0 : slabdata 35 35 0
task_xstate 694 980 576 28 4 : tunables 0 0 0 : slabdata 35 35 0
task_struct 720 828 1728 18 8 : tunables 0 0 0 : slabdata 46 46 0
anon_vma_chain 40528 42075 48 85 1 : tunables 0 0 0 : slabdata 495 495 0
anon_vma 19830 20628 112 36 1 : tunables 0 0 0 : slabdata 573 573 0
radix_tree_node 8697 9632 568 28 4 : tunables 0 0 0 : slabdata 344 344 0
idr_layer_cache 660 660 544 30 4 : tunables 0 0 0 : slabdata 22 22 0
dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-1024 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-512 16 16 512 16 2 : tunables 0 0 0 : slabdata 1 1 0
dma-kmalloc-256 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-192 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-8192 16410 17964 8192 4 8 : tunables 0 0 0 : slabdata 4491 4491 0
kmalloc-4096 397 512 4096 8 8 : tunables 0 0 0 : slabdata 64 64 0
kmalloc-2048 1009 1120 2048 16 8 : tunables 0 0 0 : slabdata 70 70 0
kmalloc-1024 3915 3936 1024 16 4 : tunables 0 0 0 : slabdata 246 246 0
kmalloc-512 1254 1328 512 16 2 : tunables 0 0 0 : slabdata 83 83 0
kmalloc-256 7999 13280 256 16 1 : tunables 0 0 0 : slabdata 830 830 0
kmalloc-128 5787 11360 128 32 1 : tunables 0 0 0 : slabdata 355 355 0
kmalloc-64 260022 261504 64 64 1 : tunables 0 0 0 : slabdata 4086 4086 0
kmalloc-32 36095 42752 32 128 1 : tunables 0 0 0 : slabdata 334 334 0
kmalloc-16 4864 4864 16 256 1 : tunables 0 0 0 : slabdata 19 19 0
kmalloc-8 70704 205312 8 512 1 : tunables 0 0 0 : slabdata 401 401 0
kmalloc-192 99450 130998 192 21 1 : tunables 0 0 0 : slabdata 6238 6238 0
kmalloc-96 3021 7098 96 42 1 : tunables 0 0 0 : slabdata 169 169 0
kmem_cache 42 42 192 21 1 : tunables 0 0 0 : slabdata 2 2 0
kmem_cache_node 192 192 128 32 1 : tunables 0 0 0 : slabdata 6 6 0

@behlendorf
ZFS on Linux member

Awesome, then I'll merge this patch in to master since it's clearly safe and improves stability. It may be a little broad but we can always revisit this latter if that leads to issues such as larger latencies. Thank you again for all your testing of this change and iterating with me on a reasonable fix.

@behlendorf
ZFS on Linux member

Merged as commit ce90208

@behlendorf behlendorf closed this Apr 30, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.