New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZVol with master branch source 2015-09-06 and kernel 4.1.6 comes to grinding halt. #3754
Comments
@dracwyrm thanks for testing out the updated zvol code in the master branch. I'm sorry to hear you ran in to problems. Do you happen to know if anything was logged to the console of the ZFS system when you encountered these hangs. That would go a long way towards helping us identify the root cause. |
@dracwyrm I assume that the host itself has not deadlocked. Could you provide backtraces from your kernel threads when the guests have deadlocked? http://zfsonlinux.org/faq.html#HowDoIReportProblems Ideally, your kernel would be built with CONFIG_DEBUG_INFO=y and CONFIG_FRAME_POINTER=y. |
@dracwyrm Also, what kind of block device is your KVM guest using (e.g. IDE, SATA, SCSI, virtio) and which version of Windows is this? |
I failed to reproduce this with |
That wasn't necessary. I was able to reproduce this with Doing |
@dracwyrm No need for those backtraces, although information about your guest WIndows version and guest storage device would be useful. |
Some more debugging last night revealed that the install actually finishes. It just takes about half an hour to get past 0%. I had assumed we had somehow become too fast and were triggering a race, but testing with a patch to put My test configuration is Linux 4.1.3, QEMU 2.4.0 and Windows 7 Professional x64. |
Unfortunately I was unable to reproduce this issue at all in my test environment. Everything worked smoothly. @dracwyrm could you post the environment you were testing with. My test configuration is Linux 3.19, QEMU 2.2.0 and Windows 10 x64. |
My environment is: Guest is Windows 10 Pro Dedicated nVIdia graphics card passed through via PCI pass through. USB keyboard and mouse are also passed through until I get synergy up and running. It was the guest that was deadlocking. It would just be frozen in place not doing anything. It would not respond to commands. Sometimes I was able to get the install done, but then trying to do anything in guest would be impossible, even sending the keystroke CTRL + Shift + Esc to open task manager would do nothing for ages. When I switched back to the latest release, no issues at all. It works perfectly normal. I just remembered this important bit! I did try the master branch all commits up to and including the 31th of August and it too had freezing problems, but it wasn't as bad as with the 6th of September checkout. Hope that helps. Is there anything more you need? libvirt settings (should be all you need) My ZFS Configuration:
|
Oh. My ZVol settings: |
I tried several things. I switched to ck-sources, mainly because Kernel of Truth on the Gentoo forums says he has good luck with it and is active here as well. I increased the folder watch limit. I just now tried the very latest code in spl/zfs masters. It resulted in very poor performance of the VM. In fact, it was so poor it just went to a black screen then the monitor said no video input. I reinstalled 0.6.4.2 and rebooted. Everything ran perfect. I even ran a benchmark just minutes again after switching back to 0.6.4.2 from the master branch. http://imgur.com/l8kRCWp It's over 9000 (that's actually a Steam unlock achievement). I am typing this from the VM as well. I admit that I don't know much about how file systems are done, so forgive me if this is not a smart question... Before, I always did in-kernel compiles of spl/zfs, but with the changes in Kernel 4.1, the 0.6.4.2 release does not work being compiled in. However, you can compile it separately as a module. The commit log shows 4.1 and 4.2 compatibility fixes, so I tried embedding spl and zfs into the kernel the same way I always do. It compiled fine with no errors. However, rebooting the host system, the screen would be blank right after the bios logos. No text. No errors. Just nothing. The same exact kernel compiled with the same config, minus spl/zfs being added in, boots straight away. My question is, is there still some incompatibility left that is causing this slowdown that compiling zfs into kernel finds straight away, but takes a specific condition to trigger when compiled outside the kernel as a module? Or is this a completely separate bug altogether that I need to do a separate bug report on? Thanks. |
@dracwyrm could you try with virtio HDD in Windows? I.e. attach 2 CD as ahci or ide (one Windows installation and other virtio-win.iso from fedoraproject), and define your virtual HDDs as virtio rather than ahci (or ide), for example:
This would be generated from libvirt definition similar to:
This is because emulation of both ahci (and ide) is known to be slow in qemu. Perhaps this poor emulation is hitting something in ZFS, making it even slower. It is generally recommended to avoid using ahci (or ide) emulation except for installation purposes. During installation you will need to load drivers from virtio-win CD, in order to allow Windows to "see" virtio HDD. FWIW, the above configuration is very similar to my own (virtual) Windows 7 computers, which I use daily with no problems. My host is ArchLinux with kernel 4.0.9 and ZOL version 0.6.4.2 + #3718 , qemu 2.4.0 and libvirt 1.2.18 |
@behlendorf this might be compatibility issue between ZOL and kernel 4.1 , can you try reproducing with newer kernel versions? |
I changed the HDD from Virtio-SCSI to just Virtio. It's a bit better. Writes are fine, but reading seems to be really slow. Also, it seems like only one program can access the disk at a time, because if two start to read from it, it slows down a lot. Also, the mouse jumps around when there is disk activity. It hasn't slowed to a grinding halt yet, but it does seem to come close. It's also strange that Virtio-SCSI on v0.6.4.2 was perfectly fine. It might be some issue missed that my setup seems to find, as I mentioned before your comment, I can't compile SPL/ZFS into the kernel anymore, and this is using the latest master source with a 4.1.x kernel. I have yet to try with a 4.2 kernel. So, whatever it is seems to really make itself known when I do that, as the computer doesn't boot at all then. Thanks for all your help in this. To put this last. Do these numbers seem right? Write speeds in the megabytes and read in the kilobytes? I don't know how to get it to display formatted text right, sorry. It's the output of zpool iostat -v.
|
@dracwyrm |
@Bronek I just managed to kill ZFS completely. I recompiled the kernel with Deadline scheduler since I read that it's best for KVM. Apparently not for ZFS as it came to a grinding halt like before. Had to go back to 0.6.4.2 versions to have a working VM. I was using BFQ before as it's default in the kernel. Lesson learned on that one. |
I am using deadline scheduler with 0.6.4.2 . Hope the root cause of this is identified, it would be a pity if this behaviour landed in 0.6.5 because we were not able to find the commit which caused it. |
@Bronek Yeah. It would be. I had very high hopes for the new ZVol code which would make VMs a lot faster. |
@dracwyrm You should stick to 'noop' for ZFS. Originally I thought BFQ would help but actually it was slower when I ran tests. You can see the results below: |
@fearedbliss good point. I just learned that, if a whole disk is setup for ZFS, it will switch its scheduler to noop even if default is different. However if only a partition is setup for ZFS, it will leave default scheduler. @dracwyrm are you using whole disks for ZFS? If so, what scheduler is set for these disks? For example on my system:
What happens if you change the scheduler in runtime to noop, before starting your VMs? Here is how to do it:
|
@Bronek If by whole disk zfs you mean "zpool create tank /dev/sda" then no. I don't want to rely (nor like) GRUB 2. I basically use the concept of whole disk but for partition. Two things, I don't want /boot inside ZFS, also having swap inside of zfs (zvol) causes my server and laptop to crash. Once I removed 'swap' from zfs, my systems magically became completely stable. Layout is as follows: /dev/sda1 /boot ext2 250 MB (Extlinux as bootloader) And yup, you are right about the scheduler being changed. Normally my scheduler is CFQ since my kernel's scheduler is set to CFQ by default (and I didn't let ZFS partition my drive). I use elevator=noop at boot time. |
@Bronek @fearedbliss This is what the system automatically does as I have never messed with setting individual scheduler settings: Mental note: Since it switched to noop automatically, then can I use BFQ for the main drive for performance of the host system... |
@dracwyrm That's weird that it doesn't switch it, I'm not sure if that is intended but you should file another bug for that. From the results that I posted above, I wouldn't recommend using BFQ (or any other scheduler other than no-op) if you are using ZFS. |
@fearedbliss Setting all drives to noop did not help at all. I switched back to v0.6.4.2 and switched the sde/sdf to noop and v0.6.4 does perform a bit better now. So, same exact kernel, same scheduler, just two different versions of ZFS/SPL. The latest makes the VM so slow that it stalls out and virt-manager thinks the VM is in paused mode, but can't be unpaused. I will file a separate bug report for the log/cache drives not being noop, because I notice a difference, even on v0.6.4.2. |
Just for grins, I tried 0.6.5 with the same results. Then I tried a 4.2 kernel and the results were even more disastrous. The video wouldn't even come on and virt-manager reported high cpu usage. I'll double check my config to see if anything was missed. I did switch the 4.2 config to a noop only configuration. |
@dracwyrm I regret that we did not identify the cause of this issue before 0.6.5 was tagged. This appears to be a critical data loss issue that is the first in the project's history to make it to a tagged production release. What happens is that there is a bug on non-aligned discard requests in the zvol rework where ZoL will try to truncate them to block boundaries, but the size is not updated, so it effectively rounds the request up to a block boundary. If the difference is N bytes, bytes after the request's end point will be discarded, causing data loss while incurring the read-modify-write overhead that this optimization was intended to avoid. #3798 should fix this. It has been backported to Gentoo via sys-fs/zfs-kmod-0.6.5-r1. My apologies for the issues that this caused. This was an extremely subtle bug that passed review and our current regression tests. The regression tests should be updated to catch this kind of regression before the next release. |
@dracwyrm I think it's almost certain your issue was caused by #3833. Sorry we didn't get to the bottom of this sooner, it will be addressed in 0.6.5.2 and it would be great if you could confirm for us it fixed the issue. The required patches are all currently in the master branch if you'd like to test sooner. |
Closing, this issue is believe to be resolved. @dracwyrm please let us know if that's not the case with 0.6.5.2. |
Hi, Tested 0.6.5.4 with both a 4.1.x and 4.4 kernel and this issue still persists. Sorry, hadn't had much time to play with new versions of ZFS. This issue is not closed for me. I also removed block devices from being controlled by cgroups in the libvirtd config file to see if that would help, and it didn't. My next step to debugging is to remove patches until I find the culprit. I will go back to 0.6.5.0 and work my way backwards. This will take a long time. Can you please reopen? -Jon |
I have a question about something that I don't understand. I unload all these modules: zfs, zunicode, zavl, zcommon, znvpair, spl However, if I restart the host computer, the performance of the guest is back to a grinding halt. The disk benchmarks drop to a crawl (roughly 900MBS throughput to about 300MBS throughput). So, I tried again. Reinstalled v0.6.4.2, restarted the host, and back to normal speed in the quest. My question is: How can restarting the machine versus reloading the modules make such a difference? The new modules are fully loaded, so the guest is running on a ZVol that is running under the new drivers and not the old ones, so I would expect the same performance issue. The only thing that I can think of is that the guest was running on the old drivers, so everything is loaded in memory or the arc??? and then the new drivers are loaded, so everything is still in memory or arc, then on reboot whatever is in memory is wiped out. If so, then is this a file transfer to memory/arc/whatever error? Thanks. |
That's a good question. If you're able to remove and kernel modules and load new ones then you've definitely wiped any existing ZFS state from the system. We have to tear down everything so they can be unloaded. It sounds like perhaps something else is going on, but I'm not sure what. |
Hi all, I have good news and bad news. The good: I found the patch that is affecting VM Guest performance. The bad: You may not like which one it is. First, my testing methodology. It was this one: I read the description and it said people experience 50 to 80% increase in IOPs. Was this under a VM Guest? Or just a ZVol mounted under Linux running a test suite? Also, there were other bug reports that were opened after mine when 0.6.5 was officially released saying that VM Guest performance was poor after the upgrade. So, since I wasn't the only one having issues, can this patch be reversed until the cause for KVM and VM Guest performance is found? Sorry, it screws with VMs and I wasn't the only one. Thanks, |
@dracwyrm Could you please describe the tests you're running in a bit more detail? Is this simply a matter of noticing that a Windows guest boots very slowly? I'm wondering how hard this might be to reproduce in a controlled testing environment. Also, are you continuing to use the same 3-vdev + mirrored logs and 2 cache devices shown above? Have you configured qemu (via libvirt) to access the zvols with direct IO (cache='none')? Have you run "perf top" or any other sort of diagnostics on the host while the problem is occurring? FWIW as another data point, I run Windows 7 guests fairly regularly with their storage on Virtio zvols and cache='none' and haven't noticed any performance regressions. I'll admit, however, it's mainly my "seat of the pants" feeling. Finally, did you run any benchmarks on the zvols from the host? |
@dweeezil I'll try. It's late and recompiling the whole day is not fun, so my grammar isn't exactly the greatest. The same Raid/log/cache config as before. I use crystal disk mark to benchmark the performance in the Guest, so it's not entirely seat of the pants. With SPL/ZFS v0.6.4.2 I would get an average of 900MBS sequential read speed and when using v0.6.5.x I would get about 400MBs. The problem is always on constant slow performance of the disk. It becomes so slow that even the mouse is affected. It jumps from point to point as I try to move it. I haven't tried perf top while this is going on, I can try tomorrow and see. I don't have any ZVols mounted on the host. I do have a vdev mounted on the Host system with files I want under raid protection. I don't access it very often, but I do know that transfers to and from a very fast. The initial transfer of about 1 terabyte of data took a very short time basically maxing out the speed SATA bus was capable of. |
@dracwyrm I ran my first very small series of tests. On a 4.1.6 kernel running a 64-bit Win7 guest, spl and zfs compiled with Obviously my test rig (on my "toy" test system) is quite different than your production system but I wanted to post a baseline set of numbers. They seem to indicate that at least for this particularly contrived benchmark, there is virtually no performance difference between the commit prior to 37f9dac and current master. My next steps are going to be to pinch the ARC size way down and to increase the test size in CDM from the default of 1MB to something larger in order to force a lot more disk IO to happen. |
@dweeezil Forgot to mention that I have since moved to kernel 4.1.15, but doesn't matter as it's been the same thing with all kernels. Even tried kernel 4.4 with ZFS 0.6.5.4. Maybe you could create a couple of loopback drives to replicate a RAID Z1 with separate log/cache? Read/Writes to a RAID system is a lot different. output of perf top while running CrystalDiskMark v5.1.1:
Cheers. |
Also, just to show how tired I am, I forgot to mention that the patch in question introduced a bug about the size not being updated. Before the size was calculated as a function argument, but the patch put the size variable as the function argument. This is why I applied the following patch to give this version of the source code a fair trial.
The performance is as described above -- very poor. I hope this shows that I did try to give this version of the source code a complete fair trial. Something in this patch is really affecting my performance. I then went further. I downloaded the whole source at this point in time and compared it to the latest source at the head of the 0.6.5 branch as of a few hours ago. I went through the differences in the files that the patch in question made to the same files of the latest in this branch, not master branch. I wanted to see if there was anything else major that would really affect the performance. There is a later commit about speed ups, but the comment for the patch in question say there is already a speed up of 50 to 80%, so in theory, I should have those same speed ups instead of slow downs. Cheers. |
@dracwyrm I am also using ZVOLs as underlying storage for my VMs but it works fine for me, perhaps there are differences in our setup which are beneficial for my VMs. Here is what I use:
|
@dracwyrm I'm a bit concerned about all the CPU time being spent in The reason I ask is because I recently had a chance to try PCI passthrough of a graphics card on a newer machine with a CPU supporting VT-D (with the guest storage on a zvol with ZoL 0.6.5.? which means I had the 37f9dac zvol code). The guest's vcpus were always at 100% utilization and perf on the host showed lots of time being spent in |
After constraining the ARC to 512MiB, I ran another set of tests with the single vdev pool and used the threaded sequential tests in Crystaldiskmark (Q32T2 settings) and a 4GiB test file. With 782b2c3 (before the new zvol code) it showed 937.5/761.3 and with current master code (4b9ed69) was 1038/719. Those numbers seem close enough to be considered identical for this micro-benchmark. I'll set up a raidz1 now and try the same thing. |
I am also using vfio (for GPU passthrough) and have no such problems. |
I am using VFIO with graphics card pass-through, so this is the same situation I am in. This is the performance issue that I am talking about. You experienced what I have been trying to say all along!!! @Bronek has no issues though. But you and I have. I have used Task manager with ZFS v0.6.4.2 and it does not show 100% CPU usage. I forgot too look when under the new ZFS as it takes ages to do anything I can frustrated. The difference is now the patch that caused this issue has been found. The question is, what is happening in this patch that causes things to haywire for these two set ups -- the one you tried and mine? Is there a conflict with the new BIO stuff and qemu/kvm? I have used both qemu 2.4 and 2.5. Is there a kernel option that I haven't configured that needs to be set or even unset? If that's the case, then maybe a way to detect it on ZFS compile like some other settings? The patch seems sound, apart for that size issue later fixed. If it really is this patch and certain configurations, then the only options I see as the cause are A) the removal of the 35 thread thing, or B) the new BIO stuff. So, that leaves me to believe there is a kernel config option. Is there a way to limit BIO threads? I have tried using storage block CGroups set by libvirt and then disabling that, but both ways yield the same results. I seem to hit that one configuration that this patch doesn't like, and it is this patch because anything less than this is fine. So, is there a way for me to revert some of the changes out of the new versions and still have it work with the new code? This way, I can test later code to see if the issue still hits? If it helps, I am using Gentoo Linux with Gentoo Sources, and I do use NOOP since earlier in this bug thread I was told to use it. Also, I did searches on the Internet and they say it's good for SSDs, which is my main drive. Then I have two SSDs for the log/cache partitions. Jon |
@dracwyrm I'd like to clarify that I've only tried vfio with GPU passthrough a single time and it performed as you described. I've not tried it with an older version of ZoL. Can you confirm that if you run your guest with the standard spice/qxl video stack that the performance is OK? |
Well, I tried it without passthrough and I had the same degraded performance. I'm well stumped on this. What precisely is going on in this patch that would cause this type of incompatibility with my system? Is it because the max threads was removed, so it's left to the defaults defined in bio.h (I think it was 256)? Is there a kernel setting I don't have? |
I messed around with kernel settings and libvirt settings (I switched to directsync) and now my perf top is this while having a very intensive disk writes/reads going on:
The raw spin lock seems to be heavy. |
Hi,
In my set up, I have KVM using ZVols on a Gentoo Linux System with Gentoo Sources 4.1.6. I have been using 0.6.4.2 versions of SPL and ZFS. The tank is built with RAIDZ1 on three spinning HDs with external Logs and Cache on two SSDs.
The commit logs show that there have been speed improvements for ZVols, so I thought I would give it a try. I downloaded the source for SPL and ZFS via the download source button that git hub as and renamed it to something like zfs-20150906.zip. Then used the Gentoo ebuilds as a base to install the new versions. I figured, source downloads like that would allow me to chose the date for an update, rather than using a live ebuild. Naturally, I restarted the machine to make sure the new modules were fully loaded and the old ones out of memory.
The VMs would work for a minute and then they would come to a grinding halt. The Windows circle of dots would not spin nor could I really move the mouse that is passed through via USB.
I wanted to do a full reinstall of Windows anyways and all my data was backed up, so I completely destroyed the tank and a did a secure erase to wipe out the drives. I then created a new tank using the updated ZFS binaries and modules hoping this would help. I tried reinstalling Windows, but the installation would not get very far before things ground to a halt again.
Here's the strange bit. I also have regular datasets on the same tank, and those worked faster than ever. I even transferred 750 Gigs of data to one dataset with no slowdowns at all. It's only the ZVols that gave bad performance.
I have since reinstalled 0.6.4.2 versions of SPL and ZFS, but I didn't recreate the tank, and started the virtual machine and it runs as fast as ever. No slow downs at all.
Cheers.
The text was updated successfully, but these errors were encountered: