New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txg_sync and z_rd_int/* become top I/O consumers, the host renders unusable for a period of time #1538
Comments
Hi |
@porbas I ported the plugin from FreeBSD, you can grab it here: https://raw.github.com/alexclear/ZoL-munin-plugin/master/zfs_stats_ |
@alexclear Thank you. You are great! :-) |
@alexclear You might try setting the module option |
@alexclear is this still a problem with latest HEAD? |
I appear to be seeing this on 0.6.5.3 - load average is currently 40+, pool is doing 1GB/s read throughput, all of it generated by If someone notes this in the very near future and there are some details I can capture while this is occuring, please let me know. Otherwise, I'm going to have to bounce this rather soon, |
Whoops, just realized it's scrubbing. Still, it seems like the scrub is starving other processes (load avg 40, 35-40% iowait), which I don't recall being this dramatic a problem in the past. |
Seem to be likely suspects. This does make scrubs impractical in production, which is problematic. |
Some tuning may be required, but real user IO will still be given priority. |
Oddly, I've also noticed something similar, but I verified that this was not happening during a scrub: $ sudo pidstat -d 5
Linux 4.2.0-1-amd64 (kdi) 12/06/2015 _x86_64_ (2 CPU)
12:16:34 AM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
12:16:39 AM 0 31 0.00 0.00 0.00 1 kswapd0
12:16:39 AM 0 150 0.00 16.00 0.00 1 jbd2/sda1-8
12:16:39 AM 0 669 1838.40 0.00 0.00 0 z_rd_int_0
12:16:39 AM 0 670 2048.00 0.00 0.00 0 z_rd_int_1
12:16:39 AM 0 671 1971.20 0.00 0.00 0 z_rd_int_2
12:16:39 AM 0 672 2037.60 0.00 0.00 0 z_rd_int_3
12:16:39 AM 0 673 1715.20 0.00 0.00 0 z_rd_int_4
12:16:39 AM 0 674 2068.80 0.00 0.00 0 z_rd_int_5
12:16:39 AM 0 675 1664.00 0.00 0.00 0 z_rd_int_6
12:16:39 AM 0 676 1995.20 0.00 0.00 0 z_rd_int_7
12:16:39 AM 0 714 0.00 0.00 0.00 245 txg_sync
12:16:39 AM 111 894 2236.80 931.20 0.00 0 transmission-da
12:16:39 AM 1000 931 36.00 0.00 0.00 158 kodi.bin I'm on 0.6.5.2 with spl 0.6.5.1 (debian 9), should that be important. I actually suspected that the "reads" from transmission-da were somehow being multiplexed to the z_rd_int_* kernel threads. But the numbers don't add up. Note that my ZFS pool consists of just one single USB hard drive. No raidz, no mirroring, nothing fancy. So parity data calculation or others things (should the rd_int threads do that) can't be it. EDIT: Don't know whether it's related but I just saw this appear in my terminal for the very first time:
One data point that might help: this unit has only 2GB of RAM. EDIT 2: Apparently these messages were related to stack traces, which I rescued from dmesg: https://gist.github.com/aktau/6e79ecca968a641eb60f |
After reading #2952, I feel obliged to say that I'm using lz4 compression on my zpool. Here's the configuration: https://gist.github.com/aktau/fa6b5ef2fd40201a489b |
Thanks for the update! I'm not running a self-built kernel module though, I just grabbed it from the Debian Jessie repo the zfsonlinux project provides. I assume those are built with debugging support enabled, then? That means that the That said, I find it odd that my particular workload is causing this. Back-of-the-envelope, the only process that is doing significant reading is transmission-da. Apart from checksum verification and other assorted things it's doing, the main source of reads is sharing chunks. I've capped that to 100KB/sec (so give or a take a few 10's of KB/sec). The HDD, slow and old though it is, should definitely be able to sustain that without reaching 50-90% utilization, I think. |
Is anybody still observing this behavior with a recent version of ZFS? |
Hi yes, i installed ZOL today on Debian Jessie, along with proxmox VE 4.3-3/557191d3. i am currently doing an debian install within a vm and the copy was slowing down and down and started wondering about host performance. i'm then ran syslog showing lots and lots of EDIT: oh the dd returned; this is on a hp microserver, 8gb ram, 4 x 2T in raidz1
errors: No known data errors` |
@hooliowobbits can you post your ZFS version. You can find it in |
0.6.5.7-10_g27f3ec9 |
admittedly i had seen other users altering ashift, but not knowing much about that i left it as default EDIT:
|
I think I have the same problem. The system is a Hexa-Core-Xeon E5-1650 with 128GB RAM and 2x 2TB SATA Disk running Kernelversion 4.4.40-1-pve and Proxmox 4.4-12 with ZFS module version 0.6.5.9-1. From time to time, tenth of z_wr_int_ processes producing io on the disks slowing down the host leaving almost no room for organic io. If the NVMe is in, VMs tend to crash because of no more IO -timeleft for them., because the NVMe accelerates the problem a lot. Interestingly, the processes shown in iotop are called "wr" but iotop show "disk reads", though atop shows "writes". atop reports the disks to be very busy: I already posted in the Proxmox forum, but this seems to be a problem very specific on ZFS: https://forum.proxmox.com/threads/unerkl%C3%A4rliches-phantom-write-i-o-mit-zfs.33284/ |
We have been experiencing a similar issue on Proxmox 4.x since last year. We even opened Proxmox support threads about it, this is the latest: https://forum.proxmox.com/threads/kvm-guests-on-zfs-freeze-during-backup-restore.34362/ We now think it's an IO scheduling bug in ZFS. Reproduction Mitigation We have also recognized that QCOW2 disks with cache=writeback are much less sensitive to the IO starvation from the KVM guests' point of view than ZVOL RAW disks with cache=none, but it's not yet clear why:
|
I have a dual Xeon E5645 system w/8 SATA disks, /boot is on an USB drive and / is on ZFS:
The zpool is:
This is a Debian 7.0-based system w/ZoL 0.6.1-1:
It is used mainly for virtualization, most datasets have sync=disabled and recordsize=4K
ARC is capped by 20G, ARC utilization graph follows:
L2ARC size and efficiency graphs follow:
Disk load is almost constant during working hours but we suffer from sporadical performance drops which do not seem to correlate to actual load generated by virtual machines. Or at least we were unable to find the correlation yet.
Output of iotop -d 5 -P looks like this during these performance drop periods:
We also collected iostat -x -k 2 20 output here: https://gist.github.com/alexclear/5821955 and zpool iostat -v 2 20 (at the same time) output here: https://gist.github.com/alexclear/5821960
It seems like the system experiences a lot of internal I/O (mostly random reads) not consumed directly by userspace processes during these periods. It is also very odd that L2ARC efficiency remains quite low.
The text was updated successfully, but these errors were encountered: