-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rsync causes ZoL to use all memory until system crashes STILL :( #3677
Comments
@gbkersey try going lower than half of your RAM for ARC - I suggest testing 40% (around 6 GiB):
Also set zfs_arc_min to at least 1 GiB to prevent collapsing of ARC and zfs_arc_meta_limit - if things haven't changed default for meta is 1/4 , I've set it to 1/3 Please post output of /proc/spl/kstat/zfs/arcstats when this is happening. That's roughly 600 MiB of swap used - to raise efficiency of swap you could try using zswap with lz4 compression
|
If ARC collapse is the problem, you might want to try dweeezil/spl@08807f8. |
I have made the changes to the zfs_arc parameters.... I really don't think I need to use zswap. The problem is arc_adapt fighting with kswapd and really any memory being used by arc should not be swapped. I'll see how the backup run goes tonight. |
Here's what I'm seeing when the crash starts.....
/proc/spl/kstat/zfs/arcstats
|
Argh... Again....
|
@dweeezil I'll take a look at that, thanks. |
I just upgraded my system to 3.19 and the latest zfs daily and seem to be having a similar problem - about 20 mins after boot (while tracker is starting up inotify watches on my home directory) arc_adapt will suddenly shoot to 100% cpu usage and my desktop will become.e unresponsive (though the mouse will still move). Switching to a TTY s nigh impossible though as the shell doesn't appear after a 30m wait from a very slow login. It doesn't appear to be an out of memory situation, though I have the arc constrained to 1gb with 16gb ram. This looks like a regression since I recall the same issue happening the previously. |
Have you tried to remove the ssd devices ? |
Argh... It died again....
arstats.py showint arcsz > c
Nothing much going on with the file system though....
|
I've been having his problem for quite some time and I believe that I have found an easy solution. Hopefully this will save some other folks some time.
Here are the details:
Hardware:
Supermicro H8DCL-iF
16GB ECC RAM
2x CPU AMD Opteron(tm) Processor 4334
OS Disk - WDC WD10EZEX-08M2NA0 (1TB) 7200 rpm SATA drive @ 3.0Gb/s connected to the mobo
SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
Software:
Ubuntu 14.04.3 LTS
Linux sequoia 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
ZFS version: 0.6.4.2-1~trusty
Zpool Hardware Info:
2x Marvell Technology Group Ltd. 88SE9480 SAS/SATA 6Gb/s RAID controller (rev c2)
6x HGST HDN724040ALE640 (4TB) 7200 rpm SATA drive @ 6.0Gb/s connected to Marvel Controllers
2x SanDisk SDSSDRC032G (32GB) SSD SATA drive @ 6.0GB/s connected to Marvel Controllers
Zpool:
ZFS tuning:
options zfs zfs_arc_max=8589934592
Benchmark:
running lots of rsync backup jobs
Results:
After a couple of hours of running, arc_adapt starts taking 100% CPU. When this happens, the system runs out of RAM and starts swapping to the OS disk. The swapping is so severe that no other processes can access the OS disk and the system has to be power cycled in order to get it running again.
Solution (I hope):
I found a reply to #3320 by @kernelOfTruth that mentioned Tobi Oetiker's article on preserving buffer state cache http://insights.oetiker.ch/linux/fadvise/ which looked like it would solve the problem. The reply mentioned using rsync with Tobi's fadvise patch (--drop-cache) and that looked great. However, applying that patch on the backup server and using the --drop-cache option, would require that I install a version of rsync with the patch on all systems being backed up.
I started looking for references to that patch in the rsync mailing list and came upon a this bug entry: https://bugzilla.samba.org/show_bug.cgi?id=9560#C3
It appears that @Feh took Tobi's rsync patch and built a wrapper called nocache https://github.com/Feh/nocache This wrapper appears to solve the problem and I don't have to package and upgrade rsync on all of the hosts that are being backed up....
YMMV
Comments apreciated.... Thanks!
The text was updated successfully, but these errors were encountered: