spl_taskq_thread_dynamic=1 zvol performance regression #470
Comments
OK, the problem is not quite gone; I still get load spikes like this occasionally:
While this is happening, the following processes are in D state:
|
Another similar occurrence:
sysrq-w stack traces:
|
A slowdown this morning lasted about 10 minutes (but I wasn't around to check what else could be seen):
arcstat.py from this period:
And some system level stats:
The vserver-hashify process was likely the cause of the high load on zd0, but I still don't get how the zvol can be at 100% "bandwidth" utilization if all backing disks are under 20%? Also, the entirety of data on zd0 would've fit into ARC:
|
@akorn This may very well have been fixed by the numerous arc changes which have been made post-0.6.4.2. Your issue is being caused by the adapt thread not being able to free memory, likely metadata. You may want to try this with current master code but, unfortunately, there's a regression which was caused by the manner in which it calculates As to your particular issue, it would be interesting to know what the |
@akorn going back to the original problem. If setting From what you've described my first guess would be that the dynamic taskq isn't ramping up the number of zvol threads fast enough. That would explain why disabling the taskqs and always leaving 64 threads running would help. Could you try setting the following options and see if it helps. This will cause new taskq threads to be created more aggressively.
|
@behlendorf, I'll see what I can do but this is not a box I can easily experiment with, and I haven't had the same problem on any of the others. |
@akorn no problem, it was just a suggestion. I'm glad to hear the issue isn't more wide spread. FWIW these are both run time tunables so you don't need to restart anything to try it quick.
|
With the zvol rework merged this should be effectively fixed, right? |
Yes. Yes it is. Closing. |
Hi,
after upgrading from an earlier 0.6.3 version without spl_taskq_thread_dynamic to one with spl_taskq_thread_dynamic=1 by default I noticed very high load and poor interactive response times on one of my boxes.
Investigating further I found that in
iostat -x 3
the zvols were reported at 100% %util while the physical disks were idle; await and svctime for the zvols were also high, in the thousands (sometimes independently, sometimes both).There were 32 zvol threads running (the configured maximum).
@DeHackEd suggested on IRC that the problem may have to do with spl_taskq_thread_dynamic, so I set it to 0 and rebooted, and the problem is gone.
FWIW, some information about thet box:
The box runs twenty-odd vserver guests; its swap as well as the shared root directory of the guests are on zvols. These are the only two zvols; there are; there are about 260 zfs filesystems. Most write I/O is asynchronous.
The text was updated successfully, but these errors were encountered: