Join GitHub today
rsync causes ZoL to use all memory until system crashes STILL :( #3677
I've been having his problem for quite some time and I believe that I have found an easy solution. Hopefully this will save some other folks some time.
Here are the details:
Ubuntu 14.04.3 LTS
Zpool Hardware Info:
2x Marvell Technology Group Ltd. 88SE9480 SAS/SATA 6Gb/s RAID controller (rev c2)
options zfs zfs_arc_max=8589934592
running lots of rsync backup jobs
After a couple of hours of running, arc_adapt starts taking 100% CPU. When this happens, the system runs out of RAM and starts swapping to the OS disk. The swapping is so severe that no other processes can access the OS disk and the system has to be power cycled in order to get it running again.
Solution (I hope):
I found a reply to #3320 by @kernelOfTruth that mentioned Tobi Oetiker's article on preserving buffer state cache http://insights.oetiker.ch/linux/fadvise/ which looked like it would solve the problem. The reply mentioned using rsync with Tobi's fadvise patch (--drop-cache) and that looked great. However, applying that patch on the backup server and using the --drop-cache option, would require that I install a version of rsync with the patch on all systems being backed up.
I started looking for references to that patch in the rsync mailing list and came upon a this bug entry: https://bugzilla.samba.org/show_bug.cgi?id=9560#C3
It appears that @Feh took Tobi's rsync patch and built a wrapper called nocache https://github.com/Feh/nocache This wrapper appears to solve the problem and I don't have to package and upgrade rsync on all of the hosts that are being backed up....
@gbkersey try going lower than half of your RAM for ARC - I suggest testing 40% (around 6 GiB):
Also set zfs_arc_min to at least 1 GiB to prevent collapsing of ARC and zfs_arc_meta_limit - if things haven't changed default for meta is 1/4 , I've set it to 1/3
Please post output of /proc/spl/kstat/zfs/arcstats when this is happening.
That's roughly 600 MiB of swap used - to raise efficiency of swap you could try using zswap with lz4 compression
I have made the changes to the zfs_arc parameters....
I really don't think I need to use zswap. The problem is arc_adapt fighting with kswapd and really any memory being used by arc should not be swapped.
I'll see how the backup run goes tonight.
Here's what I'm seeing when the crash starts.....
I just upgraded my system to 3.19 and the latest zfs daily and seem to be having a similar problem - about 20 mins after boot (while tracker is starting up inotify watches on my home directory) arc_adapt will suddenly shoot to 100% cpu usage and my desktop will become.e unresponsive (though the mouse will still move). Switching to a TTY s nigh impossible though as the shell doesn't appear after a 30m wait from a very slow login.
It doesn't appear to be an out of memory situation, though I have the arc constrained to 1gb with 16gb ram. This looks like a regression since I recall the same issue happening the previously.
Argh... It died again....
arstats.py showint arcsz > c
Nothing much going on with the file system though....